Hello, You can see additional information on slides for Linguistic Tools in OpenOffice.org : OpenOffice.org Conference 2005
http://danielnaber.de/publications/ooocon2005-lingucomponent.pdf >Hello my dear ones, > >A couple of days ago I was on IRC in #dev.openoffice.org chatting with JZA. > >I came up with the idea of creating a GUI to edit the thesaurus of AOO. > >JZA told me the files were in TXT format and gave me a URL with several >information but I gave a quick look and didn't find anything about the data >dictionary of the thesaurus. > >The tool will be called "Proofing Tool GUI" and will be coded in PureBasic. Is >this a good name? PureBasic allows to compile in Windows/Linux/Mac/Amiga. > >The reason why I want to code it is because months ago I contacted my friends >at Minho University in Portugal who are in charge of PT-pt and I wanted to >send them words to be used as synonymous but they didn't know how to add them. > >This made me think that there isn't a tool for doing that, so my idea is good >because it can be used by the whole community of developers. > >I unziped the Portuguese .OXT and grabbed the files: >- th_pt_PT.idx >- th_pt_PT.dat > >I opened them with Microsoft Expression Web 4 to keep the UTF-8 format but >didn't understand completely how they work. > >For example, in the .idx one I had: >UTF-8 >12940 >1|6 >a cerca de|16097 >a começar de|19986 >a favor|32934 >a partir de|67469 >a respeito de|77248 > ... etc... > > >in the .dat one I had: >UTF-8 >1|3 >-|anuviado >-|aperitivo >-|sigla >ababelado|1 >-|atrapalhado|baralhado|atarantado|desnorteado >ababelar|1 >-|baralhar|atrapalhar >abaçanado|1 > ... etc... > >It seems there are at least three levels of synonymous in the .dat one but I >don't know how to interpret them if I create a GUI. > >Also, in the .idx one there are numbers too which I don't understand the >meaning. > >Is there a URL which explains every detail of those files? -- Yakov Reztsov