2013/5/5 RGB ES <rgb.m...@gmail.com>

> 2013/5/5 Marco A.G.Pinto <marcoagpi...@mail.telepac.pt>
>
>>  Hello my dear ones,
>>
>> A couple of days ago I was on IRC in #dev.openoffice.org chatting with
>> JZA.
>>
>> I came up with the idea of creating a GUI to edit the thesaurus of AOO.
>>
>> JZA told me the files were in TXT format and gave me a URL with several
>> information but I gave a quick look and didn't find anything about the data
>> dictionary of the thesaurus.
>>
>> The tool will be called "Proofing Tool GUI" and will be coded in
>> PureBasic. Is this a good name? PureBasic allows to compile in
>> Windows/Linux/Mac/Amiga.
>>
>> The reason why I want to code it is because months ago I contacted my
>> friends at Minho University in Portugal who are in charge of PT-pt and I
>> wanted to send them words to be used as synonymous but they didn't know how
>> to add them.
>>
>> This made me think that there isn't a tool for doing that, so my idea is
>> good because it can be used by the whole community of developers.
>>
>> I unziped the Portuguese .OXT and grabbed the files:
>> - th_pt_PT.idx
>> - th_pt_PT.dat
>>
>> I opened them with Microsoft Expression Web 4 to keep the UTF-8 format
>> but didn't understand completely how they work.
>>
>> For example, in the *.idx* one I had:
>> UTF-8
>> 12940
>> 1|6
>> a cerca de|16097
>> a começar de|19986
>> a favor|32934
>> a partir de|67469
>> a respeito de|77248
>>    ... etc...
>>
>>
>> in the *.dat* one I had:
>> UTF-8
>> 1|3
>> -|anuviado
>> -|aperitivo
>> -|sigla
>> ababelado|1
>> -|atrapalhado|baralhado|atarantado|desnorteado
>> ababelar|1
>> -|baralhar|atrapalhar
>> abaçanado|1
>>    ... etc...
>>
>> It seems there are at least three levels of synonymous in the *.dat* one
>> but I don't know how to interpret them if I create a GUI.
>>
>> Also, in the *.idx* one there are numbers too which I don't understand
>> the meaning.
>>
>> Is there a URL which explains every detail of those files?
>>
>> Thanks!
>>
>> Kind regards from,
>>          >Marco A.G.Pinto
>>            -----------------------
>>
>
>
> AFAIK, most AOO thesaurus are based on OpenThesaurus
>
> http://sourceforge.net/projects/openthesaurus/
>

The right URL is

https://github.com/danielnaber/openthesaurus



>
>
> which is already a working web interface to add words to a thesaurus
> database that can be exported to several formats, included the one used by
> AOO.
>
> There are localized projects that use openthes like
>
> http://openthesaurus.caixamagica.pt/
> http://openthes-es.berlios.de/
> http://synonimy.sourceforge.net/
> http://www.openthesaurus.de/
> http://www.openthesaurus.tk
> http://synonymer.merg.net/
>
> The PT site seems quite old, but maybe you can find some tips there.
>
> There is an old article from Bruce Byfield here
>
> http://archive09.linux.com/articles/51675?tid=93
>
> The problem with thesaurus and dictionaries in general is that they are
> far more than a simple list of words: you need to tell the system the
> possible variants, if it is a noun, a verb, if it's a real synonymous or
> just a similar word...
>
> Regards
> Ricardo
>
>
>
>>
>>
>>
>>
>> --
>>
>
>

Reply via email to