Hi,
I'm using Lucene in an open source java project at
http://belletmen.dev.java.net .
In the project there are several dictionaries with a simple structure.
All items are composed of a "phrase", and a "definition". Both parts
might contain a single word, or have lots of words.
Since both parts might contain multiple words, I used the following:
private Document buildDocument(SozlukBirimi birim){
Document doc = new Document();
doc.add(Field.Keyword("soz", birim.getSoz()));//soz means word
in Turkish
doc.add(Field.Text("soz1", birim.getSoz()));//the same as
keyword part
doc.add(Field.Text("anlam", birim.getAnlam()));//anlam means
meaning in Turkish
return doc;
}
As you can see, I used the first part both as a keyword field, and a
text field. The reason is that the program will try to find phrases, or
single words in the first part also.
At the first stages of the application, there were a single
English-Turkish dictionary, and I had used an analyzer in which both
English and Turkish stop words are included.
And, here my questions:
1- Do you think whether the above system is a good solution for a
dictionary, or not?
2- I'm in hesitation now, about using stop words in a dictionary. What
do you think?
3- I have a quite big timing problem. For a 107155 items of an
English-English dictionary, it took 1436 seconds to complete the
indexing on a 600MHz Pentium 4 Laptop with 256 MB of memory. Is it
normal? Or, am I in a completely wrong way?
I'm waiting for your suggestions.
Thanks a lot.
Ahmet Aksoy
---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]