HFST - Helsinki Finite-State Transducer Technology - Python API
version 3.12.0
|
A synchronous finite-state transducer. More...
Public Member Functions | |
def | __init__ |
Create an empty transducer. More... | |
def | __init__ |
Create a deep copy of HfstTransducer another or a transducer equivalent to HfstBasicTransducer another. More... | |
def | __init__ |
Create an HFST transducer equivalent to HfstBasicTransducer t. More... | |
def | __str__ |
An AT&T representation of the transducer. More... | |
def | compare |
Whether this transducer and another are equivalent. More... | |
def | compose |
Compose this transducer with another. More... | |
def | compose_intersect |
Compose this transducer with the intersection of transducers in v. More... | |
def | concatenate |
Concatenate this transducer with another. More... | |
def | conjunct |
Alias for intersect. More... | |
def | convert |
Convert the transducer into an equivalent transducer in format type. More... | |
def | copy |
Return a deep copy of the transducer. More... | |
def | cross_product |
Make cross product of this transducer with another. More... | |
def | determinize |
Determinize the transducer. More... | |
def | disjunct |
Disjunct this transducer with another. More... | |
def | eliminate_flag |
Eliminate flag diacritic symbol from the transducer. More... | |
def | eliminate_flags |
Eliminate flag diacritics listed in symbols from the transducer. More... | |
def | extract_longest_paths |
Extract longest paths of the transducer. More... | |
def | extract_paths |
Extract paths that are recognized by the transducer. More... | |
def | extract_shortest_paths |
Extract shortest paths of the transducer. More... | |
def | get_alphabet |
Get the alphabet of the transducer. More... | |
def | get_name |
Get the name of the transducer. More... | |
def | get_properties |
Get all properties from the transducer. More... | |
def | get_property |
Get arbitrary string propert property. More... | |
def | get_type |
The implementation type of the transducer. More... | |
def | has_flag_diacritics |
Whether the transducer has flag diacritics in its transitions. More... | |
def | input_project |
Extract the input language of the transducer. More... | |
def | insert_freely |
Freely insert a transition or a transducer into the transducer. More... | |
def | insert_to_alphabet |
Explicitly insert symbol to the alphabet of the transducer. More... | |
def | intersect |
Intersect this transducer with another. More... | |
def | invert |
Swap the input and output symbols of each transition in the transducer. More... | |
def | is_automaton |
Whether each transition in the transducer has equivalent input and output symbols. More... | |
def | is_cyclic |
Whether the transducer is cyclic. More... | |
def | is_implementation_type_available |
Whether HFST is linked to the transducer library needed by implementation type type. More... | |
def | is_infinitely_ambiguous |
Whether the transducer is infinitely ambiguous. More... | |
def | is_lookup_infinitely_ambiguous |
Whether lookup of path input will have infinite results. More... | |
def | lenient_composition |
Perform a lenient composition on this transducer and another. More... | |
def | longest_path_size |
Get length of longest path of the transducer. More... | |
def | lookup_optimize |
Lookup string input. More... | |
def | minimize |
Minimize the transducer. More... | |
def | minus |
Alias for subtract. More... | |
def | n_best |
Extract n best paths of the transducer. More... | |
def | number_of_arcs |
The number of transitions in the transducer. More... | |
def | number_of_states |
The number of states in the transducer. More... | |
def | optionalize |
Disjunct the transducer with an epsilon transducer. More... | |
def | output_project |
Extract the output language of the transducer. More... | |
def | priority_union |
Make priority union of this transducer with another. More... | |
def | prune |
Make transducer coaccessible. More... | |
def | push_weights_to_end |
Push weights towards final state(s). More... | |
def | push_weights_to_start |
Push weights towards initial state. More... | |
def | remove_epsilons |
Remove all epsilon:epsilon transitions from the transducer so that the resulting transducer is equivalent to the original one. More... | |
def | remove_from_alphabet |
Remove symbol from the alphabet of the transducer. More... | |
def | remove_optimization |
Remove lookup optimization. More... | |
def | repeat_n |
A concatenation of n transducers. More... | |
def | repeat_n_minus |
A concatenation of N transducers where N is any number from zero to n, inclusive. More... | |
def | repeat_n_plus |
A concatenation of N transducers where N is any number from n to infinity, inclusive. More... | |
def | repeat_n_to_k |
A concatenation of N transducers where N is any number from n to k, inclusive. More... | |
def | repeat_plus |
A concatenation of N transducers where N is any number from one to infinity. More... | |
def | repeat_star |
A concatenation of N transducers where N is any number from zero to infinity. More... | |
def | reverse |
Reverse the transducer. More... | |
def | set_final_weights |
Set the weights of all final states to weight. More... | |
def | set_name |
Rename the transducer name. More... | |
def | set_property |
Set arbitrary string property property to value. More... | |
def | shuffle |
Shuffle this transducer with transducer another. More... | |
def | substitute |
Substitute symbols or transitions in the transducer. More... | |
def | subtract |
Subtract transducer another from this transducer. More... | |
def | write |
Write the transducer in binary format to ostr. More... | |
def | write_att |
Write the transducer in AT&T format to file f, write_weights defined whether weights are written. More... | |
def | write_att |
Write the transducer in AT&T format to file ofile, write_weights defines whether weights are written. More... | |
def | write_att |
Write the transducer in AT&T format to file named filename. More... | |
def | write_prolog |
Write the transducer in prolog format with name name to file f, write_weights defined whether weights are written. More... | |
A synchronous finite-state transducer.
Transducer functions modify their calling object and return a reference to the calling object after modification, unless otherwise mentioned. Transducer arguments are usually not modified.
# transducer is reversed transducer.reverse() # transducer2 is not modified, but a copy of it is disjuncted with # transducer1 transducer1.disjunct(transducer2) # a chain of functions is possible transducer.reverse().determinize().reverse().determinize()
Currently, an HfstTransducer has three implementation types that are well supported. When an HfstTransducer is created, its type is defined with an argument. For functions that take a transducer as an argument, the type of the calling transducer must be the same as the type of the argument transducer:
# this will cause a TransducerTypeMismatchException: tropical_transducer.disjunct(foma_transducer) # this works, but weights are lost in the conversion tropical_transducer.convert(hfst.ImplementationType.SFST_TYPE).disjunct(sfst_transducer) # this works, information is not lost tropical_transducer.disjunct(sfst_transducer.convert(hfst.ImplementationType.TROPICAL_OPENFST_TYPE))
With HfstTransducer constructors it is possible to create empty, epsilon, one-transition and single-path transducers. Transducers can also be created from scratch with hfst.HfstBasicTransducer and converted to an HfstTransducer. More complex transducers can be combined from simple ones with various functions.
def __init__ | ( | self | ) |
Create an empty transducer.
tr = hfst.HfstTransducer() assert(tr.compare(hfst.empty_fst()))
def __init__ | ( | self, | |
another | |||
) |
Create a deep copy of HfstTransducer another or a transducer equivalent to HfstBasicTransducer another.
another | An HfstTransducer or HfstBasicTransducer. |
An example:
tr1 = hfst.regex('foo bar foo') tr2 = hfst.HfstTransducer(tr1) tr2.substitute('foo','FOO') tr1.concatenate(tr2)
def __init__ | ( | self, | |
t, | |||
type | |||
) |
Create an HFST transducer equivalent to HfstBasicTransducer t.
The type of the created transducer is defined by type.
t | An HfstBasicTransducer. |
type | The type of the resulting transducer. If you want to use the default type, you can just call hfst.HfstTransducer(fsm) |
def __str__ | ( | self | ) |
An AT&T representation of the transducer.
Defined for print command. An example:
>>> print(hfst.regex('[foo:bar::2]+')) 0 1 foo bar 2.000000 1 1 foo bar 2.000000 1 0.000000
def compare | ( | self, | |
another | |||
) |
Whether this transducer and another are equivalent.
another | The compared transducer. |
Two transducers are equivalent iff they accept the same input/output string pairs with the same weights and the same alignments.
def compose | ( | self, | |
another | |||
) |
Compose this transducer with another.
another | The second argument in the composition. Not modified. |
def compose_intersect | ( | self, | |
v, | |||
invert = False |
|||
) |
Compose this transducer with the intersection of transducers in v.
If invert is true, then compose the intersection of the transducers in v with this transducer.
The algorithm used by this function is faster than intersecting all transducers one by one and then composing this transducer with the intersection.
v | A tuple of transducers. |
invert | Whether the intersection of the transducers in v is composed with this transducer. |
def concatenate | ( | self, | |
another | |||
) |
Concatenate this transducer with another.
def conjunct | ( | self, | |
another | |||
) |
Alias for intersect.
def convert | ( | self, | |
type, | |||
options = '' |
|||
) |
Convert the transducer into an equivalent transducer in format type.
If a weighted transducer is converted into an unweighted one, all weights are lost. In the reverse case, all weights are initialized to the semiring's one.
A transducer of type hfst.ImplementationType.SFST_TYPE, hfst.ImplementationType.TROPICAL_OPENFST_TYPE, hfst.ImplementationType.LOG_OPENFST_TYPE or hfst.ImplementationType.FOMA_TYPE can be converted into an hfst.ImplementationType.HFST_OL_TYPE or hfst.ImplementationType.HFST_OLW_TYPE transducer, but an hfst.ImplementationType.HFST_OL_TYPE or hfst.ImplementationType.HFST_OLW_TYPE transducer cannot be converted to any other type.
def copy | ( | self | ) |
Return a deep copy of the transducer.
tr = hfst.regex('[foo:bar::0.3]*') TR = tr.copy() assert(tr.compare(TR))
def cross_product | ( | self, | |
another | |||
) |
Make cross product of this transducer with another.
It pairs every string of this with every string of another. If strings are not the same length, epsilon padding will be added in the end of the shorter string.
def determinize | ( | self | ) |
Determinize the transducer.
Determinizing a transducer yields an equivalent transducer that has no state with two or more transitions whose input:output symbol pairs are the same.
def disjunct | ( | self, | |
another | |||
) |
Disjunct this transducer with another.
def eliminate_flag | ( | self, | |
symbol | |||
) |
Eliminate flag diacritic symbol from the transducer.
symbol | The flag to be eliminated. TODO: explain more. |
An equivalent transducer with no flags symbol.
def eliminate_flags | ( | self, | |
symbols | |||
) |
Eliminate flag diacritics listed in symbols from the transducer.
symbols | The flags to be eliminated. TODO: explain more. |
An equivalent transducer with no flags listed in symbols.
def extract_longest_paths | ( | self, | |
kwargs | |||
) |
Extract longest paths of the transducer.
def extract_paths | ( | self, | |
kwargs | |||
) |
Extract paths that are recognized by the transducer.
kwargs | Arguments recognized are filter_flags, max_cycles, max_number, obey_flags, output, random. |
filter_flags | Whether flags diacritics are filtered out from the result (default True). |
max_cycles | Indicates how many times a cycle will be followed, with negative numbers indicating unlimited (default -1 i.e. unlimited). |
max_number | The total number of resulting strings is capped at this value, with 0 or negative indicating unlimited (default -1 i.e. unlimited). |
obey_flags | Whether flag diacritics are validated (default True). |
output | Output format. Values recognized: 'text', 'raw', 'dict' (the default). 'text' returns a string where paths are separated by newlines and each path is represented as input_string + ":" + output_string + "\t" t weight. 'raw' yields a tuple of all paths where each path is a 2-tuple consisting of a weight and a tuple of all transition symbol pairs, each symbol pair being a 2-tuple of an input and an output symbol. 'dict' gives a dictionary that maps each input string into a list of possible outputs, each output being a 2-tuple of an output string and a weight. |
random | Whether result strings are fetched randomly (default False). |
An example:
>>> tr = hfst.regex('a:b+ (a:c+)') >>> print(tr) 0 1 a b 0.000000 1 1 a b 0.000000 1 2 a c 0.000000 1 0.000000 2 2 a c 0.000000 2 0.000000 >>> print(tr.extract_paths(max_cycles=1, output='text')) a:b 0 aa:bb 0 aaa:bbc 0 aaaa:bbcc 0 aa:bc 0 aaa:bcc 0 >>> print(tr.extract_paths(max_number=4, output='text')) a:b 0 aa:bc 0 aaa:bcc 0 aaaa:bccc 0 >>> print(tr.extract_paths(max_cycles=1, max_number=4, output='text')) a:b 0 aa:bb 0 aa:bc 0 aaa:bcc 0
TransducerIsCyclicException |
def extract_shortest_paths | ( | self | ) |
Extract shortest paths of the transducer.
def get_alphabet | ( | self | ) |
Get the alphabet of the transducer.
The alphabet is defined as the set of symbols known to the transducer.
def get_name | ( | self | ) |
Get the name of the transducer.
def get_properties | ( | self | ) |
Get all properties from the transducer.
def get_property | ( | self, | |
property | |||
) |
Get arbitrary string propert property.
property | The name of the property whose value is returned. get_property('name') works like get_name(). |
def get_type | ( | self | ) |
The implementation type of the transducer.
def has_flag_diacritics | ( | self | ) |
Whether the transducer has flag diacritics in its transitions.
def input_project | ( | self | ) |
Extract the input language of the transducer.
All transition symbol pairs isymbol:osymbol are changed to isymbol:isymbol.
def insert_freely | ( | self, | |
ins | |||
) |
Freely insert a transition or a transducer into the transducer.
ins | The transition or transducer to be inserted. |
If ins is a transition, i.e. a 2-tuple of strings: A transition is added to each state in this transducer. The transition leads from that state to itself with input and output symbols defined by ins. The weight of the transition is zero.
If ins is an hfst.HfstTransducer: A copy of ins is attached with epsilon transitions to each state of this transducer. After the operation, for each state S in this transducer, there is an epsilon transition that leads from state S to the initial state of ins, and for each final state of ins, there is an epsilon transition that leads from that final state to state S in this transducer. The weights of the final states in ins are copied to the epsilon transitions leading to state S.
def insert_to_alphabet | ( | self, | |
symbol | |||
) |
Explicitly insert symbol to the alphabet of the transducer.
symbol | The symbol (string) to be inserted. |
def intersect | ( | self, | |
another | |||
) |
Intersect this transducer with another.
def invert | ( | self | ) |
Swap the input and output symbols of each transition in the transducer.
def is_automaton | ( | self | ) |
Whether each transition in the transducer has equivalent input and output symbols.
def is_cyclic | ( | self | ) |
Whether the transducer is cyclic.
def is_implementation_type_available | ( | type | ) |
Whether HFST is linked to the transducer library needed by implementation type type.
def is_infinitely_ambiguous | ( | self | ) |
Whether the transducer is infinitely ambiguous.
A transducer is infinitely ambiguous if there exists an input that will yield infinitely many results, i.e. there are input epsilon loops that are traversed with that input.
def is_lookup_infinitely_ambiguous | ( | self, | |
tok_input | |||
) |
Whether lookup of path input will have infinite results.
Currently, this function will return whether the transducer is infinitely ambiguous on any lookup path found in the transducer, i.e. the argument input is ignored.
def lenient_composition | ( | self, | |
another | |||
) |
Perform a lenient composition on this transducer and another.
TODO: explain more.
def longest_path_size | ( | self, | |
kwargs | |||
) |
Get length of longest path of the transducer.
def lookup_optimize | ( | self | ) |
Lookup string input.
input | The input. A string or a pre-tokenized tuple of symbols (i.e. a tuple of strings). |
kwargs | Possible parameters and their default values are: obey_flags=True, max_number=-1, time_cutoff=0.0, output='tuple' |
obey_flags | Whether flag diacritics are obeyed. Always True for HFST_OL(W)_TYPE transducers. |
max_number | Maximum number of results returned, defaults to -1, i.e. infinity. |
time_cutoff | How long the function can search for results before returning, expressed in seconds. Defaults to 0.0, i.e. infinitely. Always 0.0 for transducers that are not of HFST_OL(W)_TYPE. |
output | Possible values are 'tuple', 'text' and 'raw', 'tuple' being the default. |
def minimize | ( | self | ) |
Minimize the transducer.
Minimizing a transducer yields an equivalent transducer with the smallest number of states.
def minus | ( | self, | |
another | |||
) |
Alias for subtract.
def n_best | ( | self, | |
n | |||
) |
Extract n best paths of the transducer.
In the case of a weighted transducer (hfst.ImplementationType.TROPICAL_OPENFST_TYPE or hfst.ImplementationType.LOG_OPENFST_TYPE), best paths are defined as paths with the lowest weight. In the case of an unweighted transducer (hfst.ImplementationType.SFST_TYPE or hfst.ImplementationType.FOMA_TYPE), the function returns random paths.
This function is not implemented for hfst.ImplementationType.FOMA_TYPE or hfst.ImplementationType.SFST_TYPE. If this function is called by an HfstTransducer of type hfst.ImplementationType.FOMA_TYPE or hfst.ImplementationType.SFST_TYPE, it is converted to hfst.ImplementationType.TROPICAL_OPENFST_TYPE, paths are extracted and it is converted back to hfst.ImplementationType.FOMA_TYPE or hfst.ImplementationType.SFST_TYPE. If HFST is not linked to OpenFst library, an hfst.exceptions.ImplementationTypeNotAvailableException is thrown.
def number_of_arcs | ( | self | ) |
The number of transitions in the transducer.
def number_of_states | ( | self | ) |
The number of states in the transducer.
def optionalize | ( | self | ) |
Disjunct the transducer with an epsilon transducer.
def output_project | ( | self | ) |
Extract the output language of the transducer.
All transition symbol pairs isymbol:osymbol are changed to osymbol:osymbol.
def priority_union | ( | self, | |
another | |||
) |
Make priority union of this transducer with another.
For the operation t1.priority_union(t2), the result is a union of t1 and t2, except that whenever t1 and t2 have the same string on left side, the path in t2 overrides the path in t1.
Example
Transducer 1 (t1): a : a b : b Transducer 2 (t2): b : B c : C Result ( t1.priority_union(t2) ): a : a b : B c : C
For more information, read fsmbook.
def prune | ( | self | ) |
Make transducer coaccessible.
A transducer is coaccessible iff there is a path from every state to a final state.
def push_weights_to_end | ( | self | ) |
Push weights towards final state(s).
If the HfstTransducer is of unweighted type (hfst.ImplementationType.SFST_TYPE or hfst.ImplementationType.FOMA_TYPE), nothing is done.
An example:
>>> import hfst >>> tr = hfst.regex('[a::1 a:b::0.3 (b::0)]::0.7;') >>> tr.push_weights_to_end() >>> print(tr) 0 1 a a 0.000000 1 2 a b 0.000000 2 3 b b 0.000000 2 2.000000 3 2.000000
def push_weights_to_start | ( | self | ) |
Push weights towards initial state.
If the HfstTransducer is of unweighted type (hfst.ImplementationType.SFST_TYPE or hfst.ImplementationType.FOMA_TYPE), nothing is done.
An example:
>>> import hfst >>> tr = hfst.regex('[a::1 a:b::0.3 (b::0)]::0.7;') >>> tr.push_weights_to_start() >>> print(tr) 0 1 a a 2.000000 1 2 a b 0.000000 2 3 b b 0.000000 2 0.000000 3 0.000000
def remove_epsilons | ( | self | ) |
Remove all epsilon:epsilon transitions from the transducer so that the resulting transducer is equivalent to the original one.
def remove_from_alphabet | ( | self, | |
symbol | |||
) |
Remove symbol from the alphabet of the transducer.
symbol | The symbol (string) to be removed. |
def remove_optimization | ( | self | ) |
Remove lookup optimization.
This effectively converts transducer (back) into default fst type.
def repeat_n | ( | self, | |
n | |||
) |
A concatenation of n transducers.
def repeat_n_minus | ( | self, | |
n | |||
) |
A concatenation of N transducers where N is any number from zero to n, inclusive.
def repeat_n_plus | ( | self, | |
n | |||
) |
A concatenation of N transducers where N is any number from n to infinity, inclusive.
def repeat_n_to_k | ( | self, | |
n, | |||
k | |||
) |
A concatenation of N transducers where N is any number from n to k, inclusive.
def repeat_plus | ( | self | ) |
A concatenation of N transducers where N is any number from one to infinity.
def repeat_star | ( | self | ) |
A concatenation of N transducers where N is any number from zero to infinity.
def reverse | ( | self | ) |
Reverse the transducer.
A reverted transducer accepts the string 'n(0) n(1) ... n(N)' iff the original transducer accepts the string 'n(N) n(N-1) ... n(0)'
def set_final_weights | ( | self, | |
weight | |||
) |
Set the weights of all final states to weight.
If the HfstTransducer is of unweighted type (hfst.ImplementationType.SFST_TYPE or hfst.ImplementationType.FOMA_TYPE), nothing is done.
def set_name | ( | self, | |
name | |||
) |
def set_property | ( | self, | |
property, | |||
value | |||
) |
Set arbitrary string property property to value.
property | A string naming the property. |
value | A string expressing the value of property. |
set_property('name', 'name of the transducer') equals set_name('name of the transducer').
def shuffle | ( | self, | |
another | |||
) |
Shuffle this transducer with transducer another.
If transducer A accepts string 'foo' and transducer B string 'bar', the transducer that results from shuffling A and B accepts all strings [(f|b)(o|a)(o|r)].
def substitute | ( | self, | |
s, | |||
S = None , |
|||
kwargs | |||
) |
Substitute symbols or transitions in the transducer.
s | The symbol or transition to be substituted. Can also be a dictionary of substitutions, if S == None. |
S | The symbol, transition, a tuple of transitions or a transducer (hfst.HfstTransducer) that substitutes s. |
kwargs | Arguments recognized are 'input' and 'output', their values can be False or True, True being the default. These arguments are valid only if s and S are strings, else they are ignored. |
input | Whether substitution is performed on input side, defaults to True. Valid only if s and S are strings. |
output | Whether substitution is performed on output side, defaults to True. Valid only if s and \ S are strings. |
For more information, see hfst.HfstBasicTransducer.substitute. The function works similarly, with the exception of argument S, which must be hfst.HfstTransducer instead of hfst.HfstBasicTransducer.
def subtract | ( | self, | |
another | |||
) |
Subtract transducer another from this transducer.
def write | ( | self, | |
ostr | |||
) |
Write the transducer in binary format to ostr.
ostr | A hfst.HfstOutputStream where the transducer is written. |
def write_att | ( | self, | |
f, | |||
write_weights = True |
|||
) |
Write the transducer in AT&T format to file f, write_weights defined whether weights are written.
f | A python file where transducer is written. |
write_weights | Whether weights are written. |
def write_att | ( | self, | |
ofile, | |||
write_weights = True |
|||
) |
Write the transducer in AT&T format to file ofile, write_weights defines whether weights are written.
The fields in the resulting AT&T format are separated by tabulator characters.
NOTE: If the transition symbols contain space characters,the spaces are printed as '@_SPACE_@' because whitespace characters are used as field separators in AT&T format. Epsilon symbols are printed as '@0@'.
If several transducers are written in the same file, they must be separated by a line of two consecutive hyphens "--", so that they will be read correctly by hfst.read_att.
An example:
tr1 = hfst.regex('[foo:bar baz:0 " "]::0.3') tr2 = hfst.empty_fst() tr3 = hfst.epsilon_fst(0.5) tr4 = hfst.regex('[foo]') tr5 = hfst.empty_fst() f = hfst.hfst_open('testfile.att', 'w') for tr in [tr1, tr2, tr3, tr4]: tr.write_att(f) f.write('--\n') tr5.write_att(f) f.close()
This will yield a file 'testfile.att' that looks as follows:
0 1 foo bar 0.299805 1 2 baz @0@ 0.000000 2 3 @_SPACE_@ @_SPACE_@ 0.000000 3 0.000000 -- -- 0 0.500000 -- 0 1 foo foo 0.000000 1 0.000000 --
StreamCannotBeWrittenException | |
StreamIsClosedException |
def write_att | ( | self, | |
filename, | |||
write_weights = True |
|||
) |
Write the transducer in AT&T format to file named filename.
write_weights defines whether weights are written.
If the file exists, it is overwritten. If the file does not exist, it is created.
def write_prolog | ( | self, | |
f, | |||
name, | |||
write_weights = True |
|||
) |
Write the transducer in prolog format with name name to file f, write_weights defined whether weights are written.
f | A python file where the transducer is written. |
name | The name of the transducer that must be given in a prolog file. |
write_weights | Whether weights are written. |