HFST - Helsinki Finite-State Transducer Technology - Python API  version 3.11.0
 All Classes Namespaces Files Functions Variables Pages
Using HFST with SFST, OpenFst and foma

A binary HfstTransducer consists of an HFST header (more on HFST headers on the wiki pages) and the transducer of the backend implementation.

If you want to write backend transducers as such, you can specify it with the hfst_format keyword argument of HfstOutputStream constructor:

    HfstOutputStream(hfst_format=True)

The following piece of code will write a native OpenFst transducer with tropical weights to standard output:

test.py:

 import hfst
 ab = hfst.regex('a:b::2.8')
 out = hfst.HfstOutputStream(hfst_format=False)
 out.write(ab)
 out.flush()
 out.close()

run on command line (fstprint is native OpenFst tool):

 python test.py > ab.fst
 fstprint ab.fst

output:

 0       1       a       b       2.79980469
 1

An hfst.HfstInputStream can also read backend transducers that do not have an HFST header. If we have the following files

symbols.txt:

 EPSILON 0
 a 1
 b 2

ab.txt:

 0 1 a b 0.5
 1 0.3

test.py:

 import hfst
 istr = hfst.HfstInputStream()
 while not istr.is_eof():
     tr = istr.read()
     print('Read transducer:')
     print(tr)
 istr.close()

the commands

 cat ab.txt | fstcompile --isymbols=symbols.txt --osymbols=symbols.txt --keep_isymbols --keep_osymbols | python test.py

will compile a native OpenFst transducer (fstcompile is a native OpenFst tool), read it with HFST tools and print it to standard output in AT&T text format:

 Read transducer:
 0       1       a       b       0.500000
 1       0.300000

For more information on HFST transducer formats and conversions, see the wiki pages.

An issue with foma

Foma writes its binary transducers in gzipped format using the gz tools. However, we experienced problems when trying to write to standard output or read from standard input with gz tools (foma tools do not write to or read from standard streams). So we choose to write, and accordingly read, foma transducers unzipped when writing or reading binary HfstTransducers of hfst.types.FOMA_TYPE. As a result, when we write an HfstTransducer of FOMA_TYPE in its plain backend format, the user must zip it themselves before it can be used by foma tools. (update: at least the newest releases of foma are able to read also unzipped transducers.) Similarily, a foma transducer must be unzipped before it can be read by HFST tools.

Suppose we have written a FOMA_TYPE HfstTransducer and want to use it with foma tools. First we write it, in its plain backend format, to file 'ab.foma' with the following piece of code:

 import hfst
 hfst.set_default_fst_type(hfst.types.FOMA_TYPE)
 ab = libfst.regex('a:b')
 out = hfst.HfstOutputStream(hfst_format=False)
 out.write(ab)
 out.flush()
 out.close()

The command

 gzip ab.foma

will create a file 'ab.foma.gz' that can be used by (older) foma tools.

The same with command line tools:

    echo 'a:b' | hfst-strings2fst -f foma > ab.hfst
    hfst-fst2fst --use-backend-format -f foma > ab.foma
    gzip ab.foma

An example of the opposite case follows. Suppose we have a foma transducer 'transducer.foma' and want to read it inside an HFST program. The name of the file must be appended a .gz extension so that the program 'gunzip' knows it is a zipped file. The commands

 mv transducer.foma transducer.foma.gz
 gunzip transducer.foma.gz

overwrite the original file 'transducer.foma' with an unzipped version of the same file. Now the file can be used by HFST:

 instr = hfst.HfstInputStream('transducer.foma')
 tr = instr.read()
 instr.close()

The same with command line tools:

    mv transducer.foma transducer.foma.gz
    gunzip transducer.foma.gz
    hfst-sometool transducer.foma