On Sun, May 5, 2013 at 4:45 PM, Marco A.G.Pinto <
marcoagpi...@mail.telepac.pt> wrote:

>  Hello my dear ones,
>
> A couple of days ago I was on IRC in #dev.openoffice.org chatting with
> JZA.
>
> I came up with the idea of creating a GUI to edit the thesaurus of AOO.
>
> JZA told me the files were in TXT format and gave me a URL with several
> information but I gave a quick look and didn't find anything about the data
> dictionary of the thesaurus.
>
> The tool will be called "Proofing Tool GUI" and will be coded in
> PureBasic. Is this a good name? PureBasic allows to compile in
> Windows/Linux/Mac/Amiga.
>
> The reason why I want to code it is because months ago I contacted my
> friends at Minho University in Portugal who are in charge of PT-pt and I
> wanted to send them words to be used as synonymous but they didn't know how
> to add them.
>
>

This makes me wonder...   Does it still make sense, in the year 2013, for
updates to dictionaries and thesauruses to require a download and install
of a large file.  Is there a way to do this incrementally, even live, based
on a feed (RSS or Atom)?   So I could have AOO "subscribe" to a dictionary
and receive new words as they become popular.  Maybe there can even be the
ability to have a custom subscription that is used only within a company,
to publish special words used there, technical, product names, etc.  You
could even have a menu option as part of spell checking "Add to shared
dictionary...".

-Rob




> This made me think that there isn't a tool for doing that, so my idea is
> good because it can be used by the whole community of developers.
>
> I unziped the Portuguese .OXT and grabbed the files:
> - th_pt_PT.idx
> - th_pt_PT.dat
>
> I opened them with Microsoft Expression Web 4 to keep the UTF-8 format but
> didn't understand completely how they work.
>
> For example, in the *.idx* one I had:
> UTF-8
> 12940
> 1|6
> a cerca de|16097
> a começar de|19986
> a favor|32934
> a partir de|67469
> a respeito de|77248
>    ... etc...
>
>
> in the *.dat* one I had:
> UTF-8
> 1|3
> -|anuviado
> -|aperitivo
> -|sigla
> ababelado|1
> -|atrapalhado|baralhado|atarantado|desnorteado
> ababelar|1
> -|baralhar|atrapalhar
> abaçanado|1
>    ... etc...
>
> It seems there are at least three levels of synonymous in the *.dat* one
> but I don't know how to interpret them if I create a GUI.
>
> Also, in the *.idx* one there are numbers too which I don't understand
> the meaning.
>
> Is there a URL which explains every detail of those files?
>
> Thanks!
>
> Kind regards from,
>          >Marco A.G.Pinto
>            -----------------------
>
>
>
> --
>

Reply via email to