HFST - Helsinki Finite-State Transducer Technology - Python API
version 3.12.0
|
After installing HFST on your computer, start python and execute import hfst
.
For example, the following simple program
import hfst tr1 = hfst.regex('foo:bar') tr2 = hfst.regex('bar:baz') tr1.compose(tr2) print(tr1)
should print to standard output the following text when run:
0 1 foo baz 0 1 0
The HFST API is located in a package 'hfst' that includes the following classes:
There are also functions in package 'hfst' that are not part of any class. For example hfst.fst
There are also the following submodules:
An example of creating a simple transducer from scratch and converting between transducer formats and testing transducer properties and handling exceptions:
import hfst # Create as HFST basic transducer [a:b] with transition weight 0.3 and final weight 0.5. t = hfst.HfstBasicTransducer() t.add_state(1) t.add_transition(0, 1, 'a', 'b', 0.3) t.set_final_weight(1, 0.5) # Convert to tropical OpenFst format (the default) and push weights toward final state. T = hfst.HfstTransducer(t) T.push_weights_to_end() # Convert back to HFST basic transducer. tc = hfst.HfstBasicTransducer(T) try: # Rounding might affect the precision. if (0.79 < tc.get_final_weight(1)) and (tc.get_final_weight(1) < 0.81): print("TEST PASSED") exit(0) else: print("TEST FAILED") exit(1) # If the state does not exist or is not final except hfst.exceptions.HfstException as e: print("TEST FAILED: An exception was thrown.") exit(1)
An example of creating transducers from strings, applying rules to them and printing the string pairs recognized by the resulting transducer.
import hfst hfst.set_default_fst_type(hfst.ImplementationType.FOMA_TYPE) # we use foma implementation as there are no weights involved # Create a simple lexicon transducer [[foo bar foo] | [foo bar baz]]. tok = hfst.HfstTokenizer() tok.add_multichar_symbol('foo') tok.add_multichar_symbol('bar') tok.add_multichar_symbol('baz') words = hfst.tokenized_fst(tok.tokenize('foobarfoo')) t = hfst.tokenized_fst(tok.tokenize('foobarbaz')) words.disjunct(t) # Create a rule transducer that optionally replaces 'bar' with 'baz' between 'foo' and 'foo'. rule = hfst.regex('bar (->) baz || foo _ foo') # Apply the rule transducer to the lexicon. words.compose(rule).minimize() # Extract all string pairs from the result and print them to standard output. results = 0 try: # Extract paths and remove tokenization results = words.extract_paths(output='dict') except hfst.exceptions.TransducerIsCyclicException as e: # This should not happen because transducer is not cyclic. print("TEST FAILED") exit(1) for input,outputs in results.items(): print('%s:' % input) for output in outputs: print(' %s\t%f' % (output[0], output[1]))
The output:
foobarfoo: foobarfoo 0.000000 foobazfoo 0.000000 foobarbaz: foobarbaz 0.000000