Indexing problems in a dictionary

Ahmet Aksoy Sat, 03 Sep 2005 01:13:31 -0700

Hi,

I'm using Lucene in an open source java project athttp://belletmen.dev.java.net .In the project there are several dictionaries with a simple structure.All items are composed of a "phrase", and a "definition". Both partsmight contain a single word, or have lots of words.

Since both parts  might contain multiple  words,   I used the following:
   private Document buildDocument(SozlukBirimi birim){
       Document doc = new Document();

doc.add(Field.Keyword("soz", birim.getSoz()));//soz means wordin Turkishdoc.add(Field.Text("soz1", birim.getSoz()));//the same askeyword partdoc.add(Field.Text("anlam", birim.getAnlam()));//anlam meansmeaning in Turkish

       return doc;
   }

As you can see, I used the first part both as a keyword field, and atext field. The reason is that the program will try to find phrases, orsingle words in the first part also.At the first stages of the application, there were a singleEnglish-Turkish dictionary, and I had used an analyzer in which bothEnglish and Turkish stop words are included.

And, here my questions:

1- Do you think whether the above system is a good solution for adictionary, or not?2- I'm in hesitation now, about using stop words in a dictionary. Whatdo you think?3- I have a quite big timing problem. For a 107155 items of anEnglish-English dictionary, it took 1436 seconds to complete theindexing on a 600MHz Pentium 4 Laptop with 256 MB of memory. Is itnormal? Or, am I in a completely wrong way?

I'm waiting for your suggestions.
Thanks a lot.
Ahmet Aksoy



---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Indexing problems in a dictionary

Reply via email to