2013/5/5 Marco A.G.Pinto <marcoagpi...@mail.telepac.pt> > Hello my dear ones, > > A couple of days ago I was on IRC in #dev.openoffice.org chatting with > JZA. > > I came up with the idea of creating a GUI to edit the thesaurus of AOO. > > JZA told me the files were in TXT format and gave me a URL with several > information but I gave a quick look and didn't find anything about the data > dictionary of the thesaurus. > > The tool will be called "Proofing Tool GUI" and will be coded in > PureBasic. Is this a good name? PureBasic allows to compile in > Windows/Linux/Mac/Amiga. > > The reason why I want to code it is because months ago I contacted my > friends at Minho University in Portugal who are in charge of PT-pt and I > wanted to send them words to be used as synonymous but they didn't know how > to add them. > > This made me think that there isn't a tool for doing that, so my idea is > good because it can be used by the whole community of developers. > > I unziped the Portuguese .OXT and grabbed the files: > - th_pt_PT.idx > - th_pt_PT.dat > > I opened them with Microsoft Expression Web 4 to keep the UTF-8 format but > didn't understand completely how they work. > > For example, in the *.idx* one I had: > UTF-8 > 12940 > 1|6 > a cerca de|16097 > a começar de|19986 > a favor|32934 > a partir de|67469 > a respeito de|77248 > ... etc... > > > in the *.dat* one I had: > UTF-8 > 1|3 > -|anuviado > -|aperitivo > -|sigla > ababelado|1 > -|atrapalhado|baralhado|atarantado|desnorteado > ababelar|1 > -|baralhar|atrapalhar > abaçanado|1 > ... etc... > > It seems there are at least three levels of synonymous in the *.dat* one > but I don't know how to interpret them if I create a GUI. > > Also, in the *.idx* one there are numbers too which I don't understand > the meaning. > > Is there a URL which explains every detail of those files? > > Thanks! > > Kind regards from, > >Marco A.G.Pinto > ----------------------- >
AFAIK, most AOO thesaurus are based on OpenThesaurus http://sourceforge.net/projects/openthesaurus/ which is already a working web interface to add words to a thesaurus database that can be exported to several formats, included the one used by AOO. There are localized projects that use openthes like http://openthesaurus.caixamagica.pt/ http://openthes-es.berlios.de/ http://synonimy.sourceforge.net/ http://www.openthesaurus.de/ http://www.openthesaurus.tk http://synonymer.merg.net/ The PT site seems quite old, but maybe you can find some tips there. There is an old article from Bruce Byfield here http://archive09.linux.com/articles/51675?tid=93 The problem with thesaurus and dictionaries in general is that they are far more than a simple list of words: you need to tell the system the possible variants, if it is a noun, a verb, if it's a real synonymous or just a similar word... Regards Ricardo > > > > > -- >