HFST - Helsinki Finite-State Transducer Technology - C++ API
version 3.9.1
|
A binary HfstTransducer consists of an HFST header (more on HFST headers on the wiki pages) and the transducer of the backend implementation. If you want to write backend transducers as such, you can specify it with the hfst_format argument of HfstOutputStream constructor:
HfstOutputStream(ImplementationType type, bool hfst_format=true);
The following piece of code will write an OpenFst transducer with tropical weights to standard output:
HfstTransducer ab("a", "b", TROPICAL_OFST_TYPE); HfstOutputStream out(TROPICAL_OFST_TYPE, false); out << ab;
An HfstInputStream can also read backend transducers that do not have an HFST header. If the standard input contains an SFST transducer, the following piece of code will read it successfully and convert it into an HFST transducer of type SFST_TYPE and write it to standard output (with the HFST header included).
HfstInputStream in(); HfstTransducer tr(in); HfstOutputStream out(tr.get_type(), true); out << tr;
For more information on HFST transducer formats and conversions, see the wiki pages.
Foma writes its binary transducers in gzipped format using the gz tools. However, we experienced problems when trying to write to standard output or read from standard in with gz tools (foma tools do not write to or read from standard streams). So we choose to write, and accordingly read, foma transducers unzipped when writing or reading binary HfstTransducers of FOMA_TYPE. As a result, when we write an HfstTransducer of FOMA_TYPE in its plain backend format, the user must zip it themselves before it can be used by foma tools. Similarily, a foma transducer must be unzipped before it can be read by HFST tools.
Suppose we have written a FOMA_TYPE HfstTransducer and want to use it with foma tools. First we write it, in its plain backend format, to file "ab.foma" with the following piece of code:
HfstTransducer ab("a", "b", FOMA_TYPE); HfstOutputStream out("ab.foma", FOMA_TYPE, false); out << ab;
The command
gzip ab.foma
will create a file 'ab.foma.gz' that can be used by foma tools.
The same with command line tools:
echo "a:b" | hfst-strings2fst -f foma > ab.hfst hfst-fst2fst --use-backend-format -f foma > ab.foma gzip ab.foma
An example of the opposite case follows. Suppose we have a foma transducer 'transducer.foma' and want to read it inside an HFST program. The name of the file must be appended a .gz extension so that the program 'gunzip' knows it is a zipped file. The commands
mv transducer.foma transducer.foma.gz gunzip transducer.foma.gz
overwrite the original file 'transducer.foma' with an unzipped version of the same file. Now the file can be used by HFST:
HfstInputStream in("transducer.foma"); HfstTransducer tr(in);
The same with command line tools:
mv transducer.foma transducer.foma.gz gunzip transducer.foma.gz hfst-sometool transducer.foma