HFST - Helsinki Finite-State Transducer Technology - Python API
version 3.12.3 (under development)
|
A binary HfstTransducer consists of an HFST header (more on HFST headers on the wiki pages) and the transducer of the backend implementation.
If you want to write backend transducers as such, you can specify it with the hfst_format keyword argument of HfstOutputStream constructor:
HfstOutputStream(hfst_format=True)
The following piece of code will write a native OpenFst transducer with tropical weights to standard output:
test.py:
import hfst ab = hfst.regex('a:b::2.8') out = hfst.HfstOutputStream(hfst_format=False) out.write(ab) out.flush() out.close()
run on command line (fstprint is native OpenFst tool):
python test.py > ab.fst fstprint ab.fst
output:
0 1 a b 2.79980469 1
An hfst.HfstInputStream can also read backend transducers that do not have an HFST header. If we have the following files
symbols.txt:
EPSILON 0 a 1 b 2
ab.txt:
0 1 a b 0.5 1 0.3
test.py:
import hfst istr = hfst.HfstInputStream() while not istr.is_eof(): tr = istr.read() print('Read transducer:') print(tr) istr.close()
the commands
cat ab.txt | fstcompile --isymbols=symbols.txt --osymbols=symbols.txt --keep_isymbols --keep_osymbols | python test.py
will compile a native OpenFst transducer (fstcompile is a native OpenFst tool), read it with HFST tools and print it to standard output in AT&T text format:
Read transducer: 0 1 a b 0.500000 1 0.300000
For more information on HFST transducer formats and conversions, see the wiki pages.
Foma writes its binary transducers in gzipped format using the gz tools. However, we experienced problems when trying to write to standard output or read from standard input with gz tools (foma tools do not write to or read from standard streams). So we choose to write, and accordingly read, foma transducers unzipped when writing or reading binary HfstTransducers of hfst.ImplementationType.FOMA_TYPE. As a result, when we write an HfstTransducer of FOMA_TYPE in its plain backend format, the user must zip it themselves before it can be used by foma tools. (update: at least the newest releases of foma are able to read also unzipped transducers.) Similarily, a foma transducer must be unzipped before it can be read by HFST tools.
Suppose we have written a FOMA_TYPE HfstTransducer and want to use it with foma tools. First we write it, in its plain backend format, to file 'ab.foma' with the following piece of code:
import hfst hfst.set_default_fst_type(hfst.ImplementationType.FOMA_TYPE) ab = libfst.regex('a:b') out = hfst.HfstOutputStream(hfst_format=False) out.write(ab) out.flush() out.close()
The command
gzip ab.foma
will create a file 'ab.foma.gz' that can be used by (older) foma tools.
The same with command line tools:
echo 'a:b' | hfst-strings2fst -f foma > ab.hfst hfst-fst2fst --use-backend-format -f foma > ab.foma gzip ab.foma
An example of the opposite case follows. Suppose we have a foma transducer 'transducer.foma' and want to read it inside an HFST program. The name of the file must be appended a .gz extension so that the program 'gunzip' knows it is a zipped file. The commands
mv transducer.foma transducer.foma.gz gunzip transducer.foma.gz
overwrite the original file 'transducer.foma' with an unzipped version of the same file. Now the file can be used by HFST:
instr = hfst.HfstInputStream('transducer.foma') tr = instr.read() instr.close()
The same with command line tools:
mv transducer.foma transducer.foma.gz gunzip transducer.foma.gz hfst-sometool transducer.foma