Using HFST in your own code

After installing HFST on your computer, start python and execute import hfst.

For example, the following simple program

 import hfst

 tr1 = hfst.regex('foo:bar')
 tr2 = hfst.regex('bar:baz')
 tr1.compose(tr2)
 print(tr1)

should print to standard output the following text when run:

 0      1     foo    baz    0
 1      0

Structure of the API

The HFST API is located in a package 'hfst' that includes the following classes:

hfst.HfstTransducer: A class for creating transducers and performing operations on them.
hfst.HfstInputStream and hfst.HfstOutputStream: Classes for writing and reading binary transducers.
hfst.HfstBasicTransducer: A class for creating transducers from scratch and iterating through their states and transitions.
hfst.HfstTokenizer: A class used in creating transducers from UTF-8 strings.

There are also functions in package 'hfst' that are not part of any class. For example hfst.fst

There are also the following submodules:

hfst.exceptions: hfst.exceptions.HfstException and its subclasses that are used to handle exceptional situations and errors
hfst.xerox_rules: Functions for creating transducers that implement Xerox-type replace rules
hfst.sfst_rules: Functions for creating transducers that implement various two-level rules

Examples of HFST functionalities

An example of creating a simple transducer from scratch and converting between transducer formats and testing transducer properties and handling exceptions:

 import hfst
 # Create as HFST basic transducer [a:b] with transition weight 0.3 and final weight 0.5.
 t = hfst.HfstBasicTransducer()
 t.add_state(1)
 t.add_transition(0, 1, 'a', 'b', 0.3)
 t.set_final_weight(1, 0.5)

 # Convert to tropical OpenFst format (the default) and push weights toward final state.
 T = hfst.HfstTransducer(t)
 T.push_weights_to_end()

 # Convert back to HFST basic transducer.
 tc = hfst.HfstBasicTransducer(T)
 try:
     # Rounding might affect the precision.
     if (0.79 < tc.get_final_weight(1)) and (tc.get_final_weight(1) < 0.81):
         print("TEST PASSED")
         exit(0)
     else:
         print("TEST FAILED")
         exit(1)
 # If the state does not exist or is not final
 except hfst.exceptions.HfstException as e:
     print("TEST FAILED: An exception was thrown.")
     exit(1)

An example of creating transducers from strings, applying rules to them and printing the string pairs recognized by the resulting transducer.

 import hfst
 hfst.set_default_fst_type(hfst.ImplementationType.FOMA_TYPE) # we use foma implementation as there are no weights involved

 # Create a simple lexicon transducer [[foo bar foo] | [foo bar baz]].
 tok = hfst.HfstTokenizer()
 tok.add_multichar_symbol('foo')
 tok.add_multichar_symbol('bar')
 tok.add_multichar_symbol('baz')

 words = hfst.tokenized_fst(tok.tokenize('foobarfoo'))
 t = hfst.tokenized_fst(tok.tokenize('foobarbaz'))
 words.disjunct(t)

 # Create a rule transducer that optionally replaces 'bar' with 'baz' between 'foo' and 'foo'.
 rule = hfst.regex('bar (->) baz || foo _ foo')

 # Apply the rule transducer to the lexicon.
 words.compose(rule)
 words.minimize()

 # Extract all string pairs from the result and print them to standard output.
 results = 0
 try:
     # Extract paths and remove tokenization
     results = words.extract_paths(output='dict')
 except hfst.exceptions.TransducerIsCyclicException as e:
     # This should not happen because transducer is not cyclic.
     print("TEST FAILED")
     exit(1)

 for input,outputs in results.items():
     print('%s:' % input)
     for output in outputs:
         print('  %s\t%f' % (output[0], output[1]))

The output:

 foobarfoo:
   foobarfoo     0.000000
   foobazfoo     0.000000
 foobarbaz:
   foobarbaz     0.000000