Thank you for you detailed answer. I have learned alot more about this stuff now :) As I see it accordingly to the results it's between Hunspell and Aspell. My Aspell version is 0.6 released 2006. The Hunspell was released in 2008.
When I run the Postgres command \dFt I get the following list : - ispell - simple - snowball - synonym - thesaurus So I set up my dictionary with the ispell as a template and Hunspell/Aspell files. Now I just have one decision to make :) Just another thing: > If you want to support multiple language dictionaries for a single table, > with each row associated to its own dictionary > Not really, since the two languages don't overlap, couldn't I set up two separate dictionaries and index against both on the whole table ? I think that's what Oleg was refering to. Not sure... Thanks for all the help / Moe Ps. I can't read Arabic so I can't have a look on the files to decide :O On Fri, Jan 9, 2009 at 2:14 PM, Andrew <ar...@pacific.net.au> wrote: > Hi Mohammed, > > See my answers below, and hopefully they won't lead you too far astray. > Note though, it has been a long time since I have done this and there are > doubtless more knowledgeable people in this forum who will be able to > correct anything I say that may be misleading or incorrect. > > Cheers, > > Andy > > Mohamed wrote: > > no one ? > > / Moe > > > On Thu, Jan 8, 2009 at 11:46 AM, Mohamed <mohamed5432154...@gmail.com>wrote: > >> Ok, thank you all for your help. It has been very valuable. I am starting >> to get the hang of it and almost read the whole chapter 12 + extras but I >> still need a little bit of guidance. >> >> I have now these files : >> >> - A arabic Hunspell rar file (OpenOffice version) wich includes : >> - ar.dic >> - ar.aff >> - An Aspell rar file that includes alot of files >> - A Myspell ( says simple words list ) >> - And also Andrews two files : >> - ar.affix >> - ar.stop >> >> I am thinking that I should go with just one of these right and that >> should be the Hunspell? >> > Hunspell is based on MySpell, extending it with support for complex > compound words and unicode characters, however Postgresql cannot take > advantage of Hunspell's compound word capabilities at present. Aspell is a > GNU dictionary that replaces Ispell and supports UTF-8 characters. See > http://aspell.net/test/ for comparisons between dictionaries, though be > aware this test is hosted by Aspell... I will leave it to others to argue > the merits of Hunspell vs. Aspell, and why you would choose one or the > other. > > There is an ar.aff file there and Andrews file ends with .affix, are >> those perhaps similiar? Should I skip Andrews ? >> > The ar.aff file that comes with OpenOffice Hunspell dictionary is > essentially the same as the ar.affix I supplied. Just open the two up, > compare them and choose the one that you feel is best. A Hunspell > dictionary will work better with a corresponding affix file. > > Use just the ar.stop file ? >> > The ar.stop file flags common words from being indexed. You will want a > stop file as well as the dictionary and affix file. Feel free to modify the > stop file to meet your own needs. > > >> On the Arabic / English on row basis language search approach, I will >> skip and choose the approach suggested by Oleg : >> >> if arabic and english characters are not overlaped, you can use one >>> index. >>> >> >> The Arabic letters and English letters or words don't overlap so that >> should not be an issue? Will I be able to index and search against both >> languages in the same query? >> > If you want to support multiple language dictionaries for a single > table, with each row associated to its own dictionary, use the > tsvector_update_trigger_column trigger to automatically update your tsvector > indexed column on insert or update. To support this, your table will need > an additional column of type regconfig that contains the name of the > dictionary to use when searching on the tsvector column for that particular > row. See > http://www.postgresql.org/docs/current/static/textsearch-features.html#TEXTSEARCH-UPDATE-TRIGGERSfor > more details. This will allow you to search across both languages in > the one query as you were asking. > > >> And also >> >> 1. What language files should I use ? >> 2. How does my create dictionary for the arabic language look like ? >> Perhaps like this : >> >> CREATE TEXT SEARCH DICTIONARY arabic_dic( >> TEMPLATE = ? , // Not sure what this means >> DictFile = ar, // referring to ar.dic (hunspell) >> AffFile = ar , // referring to ar.aff (hunspell) >> StopWords = ar // referring to Andrews stop file. ( what about Andrews >> .affix file ? ) >> >> // Anything more ? >> ); >> >> > From psql command line you can find out what templates you have using the > following command: > > \dFt > > or looking at the contents of the pg_ts_template table. > > If choosing a Hunspell or Aspell dictionary, I believe a value of TEMPLATE > = ispell should be okay for you - see > http://www.postgresql.org/docs/current/static/textsearch-dictionaries.html#TEXTSEARCH-ISPELL-DICTIONARY. > The template provides instructions to postgresql on how to interact with the > dictionary. The rest of the create dictionary statement appears fine to me. > > Thanks again! / Moe >> >> > ------------------------------ > No virus found in this incoming message. > Checked by AVG - http://www.avg.com > > Version: 8.0.176 / Virus Database: 270.10.3/1879 - Release Date: 1/6/2009 > 5:16 PM > > > > >