HFST - Helsinki Finite-State Transducer Technology - C++ API
version 3.9.1
|
The HFST API is currently implemented with three finite-state libraries, SFST, OpenFst and foma. The API is designed so that it is relatively easy to add a new implementation to it. There are some places in the code where you must make modifications but they are all clearly marked in the files inside comments. Most of these modifications just make HFST aware that there is a new implementation available and are quite straightforward to carry out. The new backend implementation itself is written in a separate file that must fulfil an interface common to all backend implementations. This interface defines functions that create and operate on transducers as well as datatypes for writing and reading binary transducers. This interface is used for cooperation between the different finite-state libraries and the HFST API that is visible to the end-user.
We first tell what functions and datatypes your implementation must must offer so that it can be connected as a part of the HFST API. We then go through the modifications that you must do to the code when adding your implementation to the HFST API. All these modifications are also indicated in the code inside comments, so we do not handle each change here but advise you to see the files instead. Finally we tell what changes must be done in the configuration file.
The directory libhfst/src/implementations contains two files per each library that is added under HFST. For instance for SFST there are files SfstTransducer.h and SfstTransducer.cc. If a library offers more than one transducer format, there are separate files for each format. For example for OpenFst there are files TropicalWeightTransducer.cc and .h and LogWeightTransducer.cc and .h. Each pair of files contains three static classes that act as an interface between HFST and the finite-state library in question. For instance the files FomaTransducer.h and FomaTransducer.cc contain classes FomaTransducer, FomaInputStream and FomaOutputStream that take care of interoperation between the foma library and HFST.
The directory contains skeleton files MyTransducerLibraryTransducer.h and MyTransducerLibraryTransducer.cc. These files define a group of static classes MyTransducerLibraryTransducer, MyTransducerLibraryInputStream and MyTransducerLibraryOutputStream that contain functions that operate on transducers and streams. These classes act as an interface between HFST and your library. You should write your implementations to these files. The assumption is that most of the functionalities are found already in the finite-state library that you are using and you just have to modify them a little so that thay can be accessed via a standardized interface that works similarily for all implementations.
All functions in the skeleton files return a FunctionNotImplementedException as they have no implementation. When starting to write your own implementation, you can also return the same exception for all functions that you have not yet implemented.
In the same directory there are files ConvertTransducerFormat.h ConvertTransducerFormat.cc that contain functions that convert between HFST's own transducer format, hfst::implementations::HfstBasicTransducer, and the transducer formats of different implementations. Add here functions that convert between HfstBasicTransducer and your transducer class (change MyFst to the name of your transducer class, my_namespace to the namespace where it is written and "my_transducer_library" to the name of your transducer library or some other descriptive name):
#if HAVE_MY_TRANSDUCER_LIBRARY static HfstBasicTransducer * my_transducer_library_transducer_to_hfst_basic_transducer (my_namespace::MyFst * t); static my_namespace::MyFst * hfst_basic_transducer_to_my_transducer_library_transducer (const HfstBasicTransducer * t); #endif // HAVE_MY_TRANSDUCER_LIBRARY
Add also the following lines to libhfst/src/implementations/Makefile.am:
if WANT_MY_TRANSDUCER_LIBRARY MAYBE_MY_TRANSDUCER_LIBRARY=MyTransducerLibraryTransducer.cc endif
and the variable to the list of source files:
BRIDGE_SRCS=$(MAYBE_SFST) $(MAYBE_OPENFST) $(MAYBE_FOMA) $(MAYBE_HFSTOL) # $(MAYBE_MY_TRANSDUCER_LIBRARY)
<BR> <H2>Making HFST aware of your implementation</H2> When you have written your implementation, you must connect it to HFST. In file HfstDataTypes.h there is an enum ImplementationType that lists all possible HfstTransducer implementation types. It needs a new enumerator:
MY_TRANSDUCER_LIBRARY_TYPE,
In file HfstTransducer.h you must include the header file MyTransducerLibraryTransducer.h.
#if HAVE_MY_TRANSDUCER_LIBRARY #include "implementations/MyTransducerLibraryTransducer.h" #endif
and declare that you are using the static class MyTransducerLibraryTransducer:
#if HAVE_MY_TRANSDUCER_LIBRARY using hfst::implementations::MyTransducerLibraryTransducer; #endif // #if HAVE_MY_TRANSDUCER_LIBRARY
You must add the transducer type of the finite-state library that you are using to TransducerImplementation, the union of possible transducer backend implementations:
#if HAVE_MY_TRANSDUCER_LIBRARY hfst::implementations::MyFst * my_transducer_library; #endif
The transducer type is the only thing that HFST is directly aware of the new finite-state library. All other functionalities are accessed through classes MyTransducerLibraryTransducer, MyTransducerLibraryInputStream and MyTransducerLibraryOutputStream.
You also need and interface to the class MyTransducerLibraryTransducer:
#if HAVE_MY_TRANSDUCER_LIBRARY static hfst::implementations::MyTransducerLibraryTransducer my_transducer_library_interface; #endif
In file HfstTransducer.cc, you must define the interface between HFST and your transducer library:
#if HAVE_MY_TRANSDUCER_LIBRARY hfst::implementations::MyTransducerLibraryTransducer HfstTransducer::my_transducer_library_interface; #endif
In file HfstInputStream.h union StreamImplementation needs a new value:
#if HAVE_MY_TRANSDUCER_LIBRARY hfst::implementations::MyTransducerLibraryInputStream * my_transducer_library; #endif
as well as enum TransducerType:
MY_TRANSDUCER_LIBRARY_, /* Your transducer type */
In file HfstOutputStream.h union StreamImplementation needs a new value:
#if HAVE_MY_TRANSDUCER_LIBRARY hfst::implementations::MyTransducerLibraryOutputStream * my_transducer_library; #endif
The function declarations in file hfst_apply_schemas.h and their implementations in file HfstApply.cc need an additional argument which is a pointer to a function of the new implementation. See the comments in the files for more information.
For all functions, constructors and destructors of classes HfstTransducer, HfstInputStream and HfstOutputStream as well as functions defined in file HfstApply.cc, you must add a piece of code that calls the implementation of that functionality in the class MyTransducerLibraryTransducer, MyTransducerLibraryInputStream or MyTransducerLibraryOutputStream. For some functions you have to call two or more MyTransducerLibraryTransducer functions. However, usually more complex functions are implemented with HFST API basic functions, so they do not have to be implemented separately for each library. By default all functionalities throw a FunctionNotImplementedException if the implementation type requested is not handled as a separate case in the function. This should make it easy for you to start adding your implementations gradually.
You shoud go through carefully files HfstTransducer.cc, HfstInputStream.cc, HfstOutputStream.cc and HfstApply.cc and add for each functionality add a case that calls the implementation of the new finite-state library if the implementation type requires it. We give here some examples of the pattern that it used in HFST to handle different cases and choose the right implementation.
An example of a constructor that creates an empty HfstTransducer:
HfstTransducer::HfstTransducer(ImplementationType type): type(type),anonymous(false),is_trie(true), name("") { if (not is_implementation_type_available(type)) HFST_THROW(ImplementationTypeNotAvailableException); switch (type) { #if HAVE_SFST case SFST_TYPE: implementation.sfst = sfst_interface.create_empty_transducer(); break; #endif #if HAVE_OPENFST case TROPICAL_OFST_TYPE: implementation.tropical_ofst = tropical_ofst_interface.create_empty_transducer(); this->type = TROPICAL_OFST_TYPE; break; case LOG_OFST_TYPE: implementation.log_ofst = log_ofst_interface.create_empty_transducer(); break; #endif #if HAVE_FOMA case FOMA_TYPE: implementation.foma = foma_interface.create_empty_transducer(); break; #endif /* Add here your implementation. */ //#if HAVE_MY_TRANSDUCER_LIBRARY //case MY_TRANSDUCER_LIBRARY_TYPE: //implementation.my_transducer_library // = my_transducer_library_interface.create_empty_transducer(); //break; //#endif case HFST_OL_TYPE: case HFST_OLW_TYPE: implementation.hfst_ol = hfst_ol_interface.create_empty_transducer (type==HFST_OLW_TYPE?true:false); break; case ERROR_TYPE: HFST_THROW(TransducerHasWrongTypeException); default: HFST_THROW(FunctionNotImplementedException); } }
Many functions call a function in file HfstApply.cc that takes as parameters pointers to all backend implementation functions that are available. For example the function remove_epsilons
HfstTransducer &HfstTransducer::remove_epsilons() { is_trie = false; return apply( #if HAVE_SFST &hfst::implementations::SfstTransducer::remove_epsilons, #endif #if HAVE_OPENFST &hfst::implementations::TropicalWeightTransducer::remove_epsilons, &hfst::implementations::LogWeightTransducer::remove_epsilons, #endif #if HAVE_FOMA &hfst::implementations::FomaTransducer::remove_epsilons, #endif /* Add here your implementation. */ //#if HAVE_MY_TRANSDUCER_LIBRARY //&hfst::implementations::MyTransducerLibraryTransducer::remove_epsilons, //#endif false ); }
calls the function
HfstTransducer &apply( #if HAVE_SFST SFST::Transducer * (*sfst_funct)(SFST::Transducer *), #endif #if HAVE_OPENFST fst::StdVectorFst * (*tropical_ofst_funct)(fst::StdVectorFst *), hfst::implementations::LogFst * (*log_ofst_funct)(hfst::implementations::LogFst *), #endif #if HAVE_FOMA fsm * (*foma_funct)(fsm *), #endif /* Add your library here */ //#if HAVE_MY_TRANSDUCER_LIBRARY //my_namespace::MyFst * (*my_transducer_library_funct)(my_namespace::MyFst *), //#endif bool dummy /* makes sure there is always a parameter after the function pointer parameters, * so commas between parameters are easier to handle */ );
Then the function 'apply' chooses the right function pointer to use according to the type of the transducer:
HfstTransducer &HfstTransducer::apply( #if HAVE_SFST SFST::Transducer * (*sfst_funct)(SFST::Transducer *), #endif #if HAVE_OPENFST fst::StdVectorFst * (*tropical_ofst_funct)(fst::StdVectorFst *), hfst::implementations::LogFst * (*log_ofst_funct)(hfst::implementations::LogFst *), #endif #if HAVE_FOMA fsm * (*foma_funct)(fsm *), #endif /* Add your library. */ //#if HAVE_MY_TRANSDUCER_LIBRARY //my_namespace::MyFst * (*my_transducer_library_funct)(my_namespace::MyFst *), //#endif bool foo ) { (void)foo; switch(this->type) { #if HAVE_SFST case SFST_TYPE: { SFST::Transducer * sfst_temp = sfst_funct(implementation.sfst); delete implementation.sfst; implementation.sfst = sfst_temp; break; } #endif #if HAVE_OPENFST case TROPICAL_OFST_TYPE: { fst::StdVectorFst * tropical_ofst_temp = tropical_ofst_funct(implementation.tropical_ofst); delete implementation.tropical_ofst; implementation.tropical_ofst = tropical_ofst_temp; break; } case LOG_OFST_TYPE: { hfst::implementations::LogFst * log_ofst_temp = log_ofst_funct(implementation.log_ofst); delete implementation.log_ofst; implementation.log_ofst = log_ofst_temp; break; } #endif #if HAVE_FOMA case FOMA_TYPE: { fsm * foma_temp = foma_funct(implementation.foma); this->foma_interface.delete_foma(implementation.foma); implementation.foma = foma_temp; break; } #endif /* Add your library here. */ //#if HAVE_MY_TRANSDUCER_LIBRARY //case MY_TRANSDUCER_LIBRARY_TYPE: //{ // my_namespace::MyFst * my_fst_temp = // my_transducer_library_funct(implementation.my_transducer_library); //delete implementation.my_transducer_library; //implementation.my_transducer_library = my_fst_temp; //break; //} //#endif case ERROR_TYPE: default: HFST_THROW(TransducerHasWrongTypeException); } return *this; }
Finally, in file libhfst/src/Makefile.am you must add your library to the list HFST_HDRS:
implementations/MyTransducerLibraryTransducer.h
<BR> <H2>Configuring</H2> The configuration file must be aware of the new implementation and the finite-state library. You have to add the following pieces of code to the file configure.ac (change "MY_TRANSDUCER_LIBRARY" etc. to the name of your transducer library):
AC_ARG_WITH([my_transducer_library], [AS_HELP_STRING([--with-my-transducer-library], [process unweighted fsts with my transducer library @<:@default=no@:>@])], [], [with_my_transducer_library=no]) AS_IF([test "x$with_my_transducer_library" != xno], [AC_DEFINE([HAVE_MY_TRANSDUCER_LIBRARY], [1], [Define to compile my transducer library support in HFST])]) AM_CONDITIONAL([WANT_MY_TRANSDUCER_LIBRARY], [test x$with_my_transducer_library != xno])
AS_IF([test "x$with_my_transducer_library" != "xno"], [AC_CHECK_LIB([my_transducer_library], [main], [], [AC_MSG_FAILURE([my transducer library test failed (--without-my-transducer-library to disable)])])])
AS_IF([test "x$with_my_transducer_library" != "xno"], [AC_CHECK_HEADERS([my_transducer_library/MyTransducerLibrary.h])])
* with my transducer library: $with_my_transducer_library