2013/5/5 RGB ES <rgb.m...@gmail.com> > 2013/5/5 Marco A.G.Pinto <marcoagpi...@mail.telepac.pt> > >> Hello my dear ones, >> >> A couple of days ago I was on IRC in #dev.openoffice.org chatting with >> JZA. >> >> I came up with the idea of creating a GUI to edit the thesaurus of AOO. >> >> JZA told me the files were in TXT format and gave me a URL with several >> information but I gave a quick look and didn't find anything about the data >> dictionary of the thesaurus. >> >> The tool will be called "Proofing Tool GUI" and will be coded in >> PureBasic. Is this a good name? PureBasic allows to compile in >> Windows/Linux/Mac/Amiga. >> >> The reason why I want to code it is because months ago I contacted my >> friends at Minho University in Portugal who are in charge of PT-pt and I >> wanted to send them words to be used as synonymous but they didn't know how >> to add them. >> >> This made me think that there isn't a tool for doing that, so my idea is >> good because it can be used by the whole community of developers. >> >> I unziped the Portuguese .OXT and grabbed the files: >> - th_pt_PT.idx >> - th_pt_PT.dat >> >> I opened them with Microsoft Expression Web 4 to keep the UTF-8 format >> but didn't understand completely how they work. >> >> For example, in the *.idx* one I had: >> UTF-8 >> 12940 >> 1|6 >> a cerca de|16097 >> a começar de|19986 >> a favor|32934 >> a partir de|67469 >> a respeito de|77248 >> ... etc... >> >> >> in the *.dat* one I had: >> UTF-8 >> 1|3 >> -|anuviado >> -|aperitivo >> -|sigla >> ababelado|1 >> -|atrapalhado|baralhado|atarantado|desnorteado >> ababelar|1 >> -|baralhar|atrapalhar >> abaçanado|1 >> ... etc... >> >> It seems there are at least three levels of synonymous in the *.dat* one >> but I don't know how to interpret them if I create a GUI. >> >> Also, in the *.idx* one there are numbers too which I don't understand >> the meaning. >> >> Is there a URL which explains every detail of those files? >> >> Thanks! >> >> Kind regards from, >> >Marco A.G.Pinto >> ----------------------- >> > > > AFAIK, most AOO thesaurus are based on OpenThesaurus > > http://sourceforge.net/projects/openthesaurus/ >
The right URL is https://github.com/danielnaber/openthesaurus > > > which is already a working web interface to add words to a thesaurus > database that can be exported to several formats, included the one used by > AOO. > > There are localized projects that use openthes like > > http://openthesaurus.caixamagica.pt/ > http://openthes-es.berlios.de/ > http://synonimy.sourceforge.net/ > http://www.openthesaurus.de/ > http://www.openthesaurus.tk > http://synonymer.merg.net/ > > The PT site seems quite old, but maybe you can find some tips there. > > There is an old article from Bruce Byfield here > > http://archive09.linux.com/articles/51675?tid=93 > > The problem with thesaurus and dictionaries in general is that they are > far more than a simple list of words: you need to tell the system the > possible variants, if it is a noun, a verb, if it's a real synonymous or > just a similar word... > > Regards > Ricardo > > > >> >> >> >> >> -- >> > >