HFST - Helsinki Finite-State Transducer Technology - Python API  version 3.12.2
Quick Start to HFST

Using HFST in your own code

After installing HFST on your computer, start python and execute import hfst.

For example, the following simple program

 import hfst

 tr1 = hfst.regex('foo:bar')
 tr2 = hfst.regex('bar:baz')
 tr1.compose(tr2)
 print(tr1)

should print to standard output the following text when run:

 0      1     foo    baz    0
 1      0


Structure of the API

The HFST API is located in a package 'hfst' that includes the following classes:

There are also functions in package 'hfst' that are not part of any class. For example hfst.fst

There are also the following submodules:


Examples of HFST functionalities

An example of creating a simple transducer from scratch and converting between transducer formats and testing transducer properties and handling exceptions:

 import hfst
 # Create as HFST basic transducer [a:b] with transition weight 0.3 and final weight 0.5.
 t = hfst.HfstBasicTransducer()
 t.add_state(1)
 t.add_transition(0, 1, 'a', 'b', 0.3)
 t.set_final_weight(1, 0.5)

 # Convert to tropical OpenFst format (the default) and push weights toward final state.
 T = hfst.HfstTransducer(t)
 T.push_weights_to_end()

 # Convert back to HFST basic transducer.
 tc = hfst.HfstBasicTransducer(T)
 try:
     # Rounding might affect the precision.
     if (0.79 < tc.get_final_weight(1)) and (tc.get_final_weight(1) < 0.81):
         print("TEST PASSED")
         exit(0)
     else:
         print("TEST FAILED")
         exit(1)
 # If the state does not exist or is not final
 except hfst.exceptions.HfstException as e:
     print("TEST FAILED: An exception was thrown.")
     exit(1)

An example of creating transducers from strings, applying rules to them and printing the string pairs recognized by the resulting transducer.

 import hfst
 hfst.set_default_fst_type(hfst.ImplementationType.FOMA_TYPE) # we use foma implementation as there are no weights involved

 # Create a simple lexicon transducer [[foo bar foo] | [foo bar baz]].
 tok = hfst.HfstTokenizer()
 tok.add_multichar_symbol('foo')
 tok.add_multichar_symbol('bar')
 tok.add_multichar_symbol('baz')

 words = hfst.tokenized_fst(tok.tokenize('foobarfoo'))
 t = hfst.tokenized_fst(tok.tokenize('foobarbaz'))
 words.disjunct(t)

 # Create a rule transducer that optionally replaces 'bar' with 'baz' between 'foo' and 'foo'.
 rule = hfst.regex('bar (->) baz || foo _ foo')

 # Apply the rule transducer to the lexicon.
 words.compose(rule)
 words.minimize()

 # Extract all string pairs from the result and print them to standard output.
 results = 0
 try:
     # Extract paths and remove tokenization
     results = words.extract_paths(output='dict')
 except hfst.exceptions.TransducerIsCyclicException as e:
     # This should not happen because transducer is not cyclic.
     print("TEST FAILED")
     exit(1)

 for input,outputs in results.items():
     print('%s:' % input)
     for output in outputs:
         print('  %s\t%f' % (output[0], output[1]))

The output:

 foobarfoo:
   foobarfoo     0.000000
   foobazfoo     0.000000
 foobarbaz:
   foobarbaz     0.000000