Hello,
You can see additional information on slides for  Linguistic Tools in 
OpenOffice.org : OpenOffice.org Conference 2005 

http://danielnaber.de/publications/ooocon2005-lingucomponent.pdf


>Hello my dear ones,
>
>A couple of days ago I was on IRC in #dev.openoffice.org chatting with JZA.
>
>I came up with the idea of creating a GUI to edit the thesaurus of AOO.
>
>JZA told me the files were in TXT format and gave me a URL with several 
>information but I gave a quick look and didn't find anything about the data 
>dictionary of the thesaurus.
>
>The tool will be called "Proofing Tool GUI" and will be coded in PureBasic. Is 
>this a good name? PureBasic allows to compile in Windows/Linux/Mac/Amiga.
>
>The reason why I want to code it is because months ago I contacted my friends 
>at Minho University in Portugal who are in charge of PT-pt and I wanted to 
>send them words to be used as synonymous but they didn't know how to add them.
>
>This made me think that there isn't a tool for doing that, so my idea is good 
>because it can be used by the whole community of developers.
>
>I unziped the Portuguese .OXT and grabbed the files:
>- th_pt_PT.idx
>- th_pt_PT.dat
>
>I opened them with Microsoft Expression Web 4 to keep the UTF-8 format but 
>didn't understand completely how they work.
>
>For example, in the  .idx one I had:
>UTF-8
>12940
>1|6
>a cerca de|16097
>a começar de|19986
>a favor|32934
>a partir de|67469
>a respeito de|77248
>   ... etc...
>
>
>in the  .dat one I had:
>UTF-8
>1|3
>-|anuviado
>-|aperitivo
>-|sigla
>ababelado|1
>-|atrapalhado|baralhado|atarantado|desnorteado
>ababelar|1
>-|baralhar|atrapalhar
>abaçanado|1
>   ... etc...
>
>It seems there are at least three levels of synonymous in the  .dat one but I 
>don't know how to interpret them if I create a GUI.
>
>Also, in the  .idx one there are numbers too which I don't understand the 
>meaning.
>
>Is there a URL which explains every detail of those files?



 --
Yakov Reztsov

Reply via email to