https://bugs.documentfoundation.org/show_bug.cgi?id=167649
--- Comment #2 from László Németh <[email protected]> --- (Note: and old, but still relevant mail about developing thesauri with stemming and affixation in English and in other languages: ---------- Forwarded message --------- Feladó: Németh László <xxx> Date: 2010. szept. 28., K, 2:33 Subject: Re: [lingu-dev] Adding affixation to a thesaurus To: <[email protected]> Hi, [From my previous letters, with new links]: The new stemming in OpenOffice.org thesaurus works in most languages without spelling dictionary modification (for example, the word form "cats" has synonyms in English now), but for morphological generation (for example, listing "kitties" synonym instead of "kitty" for "cats" in English) and word forms without (real) stems need some new dictionary data. See the issue 19563 (http://www.openoffice.org/issues/show_bug.cgi?id=19563), Hunspell manual (https://sourceforge.net/projects/hunspell/files/Hunspell/Documentation/hunspell4.pdf, morphological analysis section) morphological regression tests, analyze tool and new -s/-m options of the hunspell executable in the Hunspell distribution. The standalone OpenOffice.org MyThes thesaurus has a configuration option to test your thesaurus with stemming and affixation: https://sourceforge.net/projects/hunspell/files/MyThes/1.2.1/mythes-1.2.1.tar.gz See README.NEW and README for compiling. Test example Make an input.txt file with two lines, "rodents" and "consumed", and run MyThes with the test dictionary: ./example morph.idx morph.dat input.txt morph.aff morph.dic Thesaurus uses encoding ISO8859-1 stem: rodent rodent has 1 meanings meaning 0: (n) mouse mice stem: consume consume has 1 meanings meaning 0: (v) eat eaten, ate ingested The example Hunspell dictionary (meanings of the morphological fields: po: part of speech category ts: terminal suffix al: allomorph st: stem is: inflectional suffix, see http://sourceforge.net/docman/display_doc.php?docid=29374&group_id=143754#Morphological%20analysis): $ cat morph.dic 8 rodent/S po:n ts:nom mouse po:n al:mice ts:nom mice po:n st:mouse is:plur consume/TQD po:v ts:present ingest/TQD po:v ts:present eat/QT po:v al:ate al:eaten ts:present ate po:v st:eat is:past_1 eaten po:v st:eat is:past_2 $ cat morph.aff # example for morphological analysis, stemming and generation SFX D Y 4 SFX D 0 ed [^e] is:past_1 SFX D 0 d e is:past_1 SFX D 0 ed [^e] is:past_2 SFX D 0 d e is:past_2 SFX S Y 1 SFX S 0 s . is:plur SFX Q Y 1 SFX Q 0 s . is:sg_3 SFX T Y 2 SFX T 0 ing [^e] is:pr_part SFX T e ing e is:pr_part and the thesaurus (without any extra morphological information): $ cat morph.dat ISO8859-1 mouse|1 (n)|rodent rodent|1 (n)|mouse eat|1 (v)|consume|ingest consume|1 (v)|eat|ingest ingest|1 (v)|eat|consume Regards, László 2010/9/27 Andrea Pescetti <xxx>: > Reading http://www.openoffice.org/issues/show_bug.cgi?id=114774 I > understood that the OOo thesaurus support affixation, i.e., that if > "river" admits "stream" as a synonym, then looking for a synonym of > "rivers" will bring up "streams". > > Now, this never worked in the Italian thesaurus. Only the base form is > proposed. I mean, if "piccolo" (Italian for "small") admits > "limitato" (Italian for "limited") as a synonym, looking for synonyms of > the plural form "piccoli" does not show the plural "limitati", but the > base form "limitato". And this happens for all words, in OOo 3.2.1 too, > where the English thesaurus has the affixation working and is unaffected > by the issue mentioned above. > > It should thus be possible to improve the Italian thesaurus so that it > supports affixation like the English one. Can anybody point me to some > resources on how to do it? I had a look at > http://lingucomponent.openoffice.org/thesaurus.html but I wasn't able to > find an answer there. > > Thanks, > Andrea Pescetti - Italian N-L Project Lead.) For thesaurus development, the latest MyThes distribution with stemming and affixation: https://sourceforge.net/projects/hunspell/files/MyThes/1.2.4/mythes-1.2.4.tar.gz -- You are receiving this mail because: You are the assignee for the bug.
