Hi,
I'm using Lucene in an open source java project at http://belletmen.dev.java.net . In the project there are several dictionaries with a simple structure. All items are composed of a "phrase", and a "definition". Both parts might contain a single word, or have lots of words.
Since both parts  might contain multiple  words,   I used the following:
   private Document buildDocument(SozlukBirimi birim){
       Document doc = new Document();
doc.add(Field.Keyword("soz", birim.getSoz()));//soz means word in Turkish doc.add(Field.Text("soz1", birim.getSoz()));//the same as keyword part doc.add(Field.Text("anlam", birim.getAnlam()));//anlam means meaning in Turkish
       return doc;
   }
As you can see, I used the first part both as a keyword field, and a text field. The reason is that the program will try to find phrases, or single words in the first part also. At the first stages of the application, there were a single English-Turkish dictionary, and I had used an analyzer in which both English and Turkish stop words are included.
And, here my questions:
1- Do you think whether the above system is a good solution for a dictionary, or not? 2- I'm in hesitation now, about using stop words in a dictionary. What do you think? 3- I have a quite big timing problem. For a 107155 items of an English-English dictionary, it took 1436 seconds to complete the indexing on a 600MHz Pentium 4 Laptop with 256 MB of memory. Is it normal? Or, am I in a completely wrong way?
I'm waiting for your suggestions.
Thanks a lot.
Ahmet Aksoy


---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Reply via email to