2013/5/5 Marco A.G.Pinto <marcoagpi...@mail.telepac.pt>

>  Hello my dear ones,
>
> A couple of days ago I was on IRC in #dev.openoffice.org chatting with
> JZA.
>
> I came up with the idea of creating a GUI to edit the thesaurus of AOO.
>
> JZA told me the files were in TXT format and gave me a URL with several
> information but I gave a quick look and didn't find anything about the data
> dictionary of the thesaurus.
>
> The tool will be called "Proofing Tool GUI" and will be coded in
> PureBasic. Is this a good name? PureBasic allows to compile in
> Windows/Linux/Mac/Amiga.
>
> The reason why I want to code it is because months ago I contacted my
> friends at Minho University in Portugal who are in charge of PT-pt and I
> wanted to send them words to be used as synonymous but they didn't know how
> to add them.
>
> This made me think that there isn't a tool for doing that, so my idea is
> good because it can be used by the whole community of developers.
>
> I unziped the Portuguese .OXT and grabbed the files:
> - th_pt_PT.idx
> - th_pt_PT.dat
>
> I opened them with Microsoft Expression Web 4 to keep the UTF-8 format but
> didn't understand completely how they work.
>
> For example, in the *.idx* one I had:
> UTF-8
> 12940
> 1|6
> a cerca de|16097
> a começar de|19986
> a favor|32934
> a partir de|67469
> a respeito de|77248
>    ... etc...
>
>
> in the *.dat* one I had:
> UTF-8
> 1|3
> -|anuviado
> -|aperitivo
> -|sigla
> ababelado|1
> -|atrapalhado|baralhado|atarantado|desnorteado
> ababelar|1
> -|baralhar|atrapalhar
> abaçanado|1
>    ... etc...
>
> It seems there are at least three levels of synonymous in the *.dat* one
> but I don't know how to interpret them if I create a GUI.
>
> Also, in the *.idx* one there are numbers too which I don't understand
> the meaning.
>
> Is there a URL which explains every detail of those files?
>
> Thanks!
>
> Kind regards from,
>          >Marco A.G.Pinto
>            -----------------------
>


AFAIK, most AOO thesaurus are based on OpenThesaurus

http://sourceforge.net/projects/openthesaurus/

which is already a working web interface to add words to a thesaurus
database that can be exported to several formats, included the one used by
AOO.

There are localized projects that use openthes like

http://openthesaurus.caixamagica.pt/
http://openthes-es.berlios.de/
http://synonimy.sourceforge.net/
http://www.openthesaurus.de/
http://www.openthesaurus.tk
http://synonymer.merg.net/

The PT site seems quite old, but maybe you can find some tips there.

There is an old article from Bruce Byfield here

http://archive09.linux.com/articles/51675?tid=93

The problem with thesaurus and dictionaries in general is that they are far
more than a simple list of words: you need to tell the system the possible
variants, if it is a noun, a verb, if it's a real synonymous or just a
similar word...

Regards
Ricardo



>
>
>
>
> --
>

Reply via email to