|
HFST - Helsinki Finite-State Transducer Technology - Python API
version 3.11.0
|
After installing HFST on your computer, start python and execute import hfst.
For example, the following simple program
import hfst
tr1 = hfst.regex('foo:bar')
tr2 = hfst.regex('bar:baz')
tr1.compose(tr2)
print(tr1)should print to standard output the following text when run:
0 1 foo baz 0 1 0
The HFST API is located in a package 'hfst' that includes the following classes:
There are also functions in package 'hfst' that are not part of any class. For example hfst.fst
There are also the following submodules:
An example of creating a simple transducer from scratch and converting between transducer formats and testing transducer properties and handling exceptions:
import hfst
# Create as HFST basic transducer [a:b] with transition weight 0.3 and final weight 0.5.
t = hfst.HfstBasicTransducer()
t.add_state(1)
t.add_transition(0, 1, 'a', 'b', 0.3)
t.set_final_weight(1, 0.5)
# Convert to tropical OpenFst format (the default) and push weights toward final state.
T = hfst.HfstTransducer(t)
T.push_weights_to_end()
# Convert back to HFST basic transducer.
tc = hfst.HfstBasicTransducer(T)
try:
# Rounding might affect the precision.
if (0.79 < tc.get_final_weight(1)) and (tc.get_final_weight(1) < 0.81):
print("TEST PASSED")
exit(0)
else:
print("TEST FAILED")
exit(1)
# If the state does not exist or is not final
except hfst.exceptions.HfstException as e:
print("TEST FAILED: An exception was thrown.")
exit(1)An example of creating transducers from strings, applying rules to them and printing the string pairs recognized by the resulting transducer.
import hfst
hfst.set_default_fst_type(hfst.types.FOMA_TYPE) # we use foma implementation as there are no weights involved
# Create a simple lexicon transducer [[foo bar foo] | [foo bar baz]].
tok = hfst.HfstTokenizer()
tok.add_multichar_symbol('foo')
tok.add_multichar_symbol('bar')
tok.add_multichar_symbol('baz')
words = hfst.tokenized_fst(tok.tokenize('foobarfoo'))
t = hfst.tokenized_fst(tok.tokenize('foobarbaz'))
words.disjunct(t)
# Create a rule transducer that optionally replaces 'bar' with 'baz' between 'foo' and 'foo'.
rule = hfst.regex('bar (->) baz || foo _ foo')
# Apply the rule transducer to the lexicon.
words.compose(rule).minimize()
# Extract all string pairs from the result and print them to standard output.
results = 0
try:
# Extract paths and remove tokenization
results = words.extract_paths(output='dict')
except hfst.exceptions.TransducerIsCyclicException as e:
# This should not happen because transducer is not cyclic.
print("TEST FAILED")
exit(1)
for input,outputs in results.items():
print('%s:' % input)
for output in outputs:
print(' %s\t%f' % (output[0], output[1]))The output:
foobarfoo: foobarfoo 0.000000 foobazfoo 0.000000 foobarbaz: foobarbaz 0.000000
1.8.7