Re: Synonyms and Ranking

2007-12-27 Thread Grant Ingersoll
You can use the payload functionality (have a look at BoostingTermQuery and Michael B. excellent ApacheCon talk at http://people.apache.org/~buschmi/apachecon/). Other option is to put the synonyms into a separate field and boost that less than the main field. -Grant On Dec 27, 2007, at 4

Synonyms and Ranking

2007-12-27 Thread Frank Schima
Happy festivus everyone, So I have my fancy new stemmed synonym based Lucene index. Let's say I have the following synonym defined: radiation -> radiotherapy (and the reverse) The search results rank all results exactly the same. Is there a way to Boost the actual search term a little higher t

RE: Pagination ...

2007-12-27 Thread Dragon Fly
Thanks. Date: Wed, 26 Dec 2007 13:07:03 -0500 From: [EMAIL PROTECTED] To: [EMAIL PROTECTED] Subject: Re: Pagination ... CC: java-user@lucene.apache.org You might want to take a look at Solr (http://lucene.apache.org/solr/). You could either use Solr directly, or see how they implement paging.

Re: JVM heap when indexing in memory

2007-12-27 Thread tgospodinov
You were right. I forgot I optimized the index at the end. Thanks for your help. Erick Erickson wrote: > > Are you optimizing your index? I suspect that if you are, that's > the problem. > > Why does it matter? Is this just a curiosity question or is there > some problem? > > Best > Erick >

Re: JVM heap when indexing in memory

2007-12-27 Thread tgospodinov
Thanks for your reply, Erick. I am not optimizing my index. I am trying to keep heap use to a minimum. The indexing service that I'm working on will be part of a wizard in an RCP framework and heap use is crucial for performance. Do you know why its doing that? Erick Erickson wrote: > > Are you

Re: Synonyms in Lucene 2.2

2007-12-27 Thread Erick Erickson
Oh, now I get it . While I thoroughly approve of the strong typing in generics, I continually trip over code written for the 1.4 code base not being approved by the 1.5 compiler... Best Erick On Dec 27, 2007 12:29 PM, Frank Schima <[EMAIL PROTECTED]> wrote: > > Hi Erick, > > > > Erick Erickson w

Re: JVM heap when indexing in memory

2007-12-27 Thread Erick Erickson
Are you optimizing your index? I suspect that if you are, that's the problem. Why does it matter? Is this just a curiosity question or is there some problem? Best Erick On Dec 27, 2007 12:18 PM, tgospodinov <[EMAIL PROTECTED]> wrote: > > Does anyone know why JVM heap use almost doubles at the v

Re: Synonyms in Lucene 2.2

2007-12-27 Thread Frank Schima
Hi Erick, Erick Erickson wrote: > > I don't think this has anything to do with Lucene, the problem > seems to be that your compiler can't find the Java Stack > class. > > You need to set your classpath to include wherever > java.utils is on your disk. > I agree it's a Java issue. I'm just u

JVM heap when indexing in memory

2007-12-27 Thread tgospodinov
Does anyone know why JVM heap use almost doubles at the very end when indexing in memory? around 9 megs @ 1:03 min into indexing - around 18 megs @ 1:05 min when indexing is complete -> heap use jumps about 9 megs in 2 sec!? -- View this message in context: http://www.nabble.com/JVM-heap-when-

Re: Synonyms in Lucene 2.2

2007-12-27 Thread Erick Erickson
I don't think this has anything to do with Lucene, the problem seems to be that your compiler can't find the Java Stack class. You need to set your classpath to include wherever java.utils is on your disk. Erick On Dec 27, 2007 10:56 AM, Frank Schima <[EMAIL PROTECTED]> wrote: > > Hello all,

Synonyms in Lucene 2.2

2007-12-27 Thread Frank Schima
Hello all, I'm trying to implement a synonym engine in Lucene 2.2 based on the code in the Lucene In Action book. However, I'm getting compile errors: My Synonym filter looks like this: import org.apache.lucene.analysis.TokenStream; import org.apache.lucene.analysis.TokenFilter; import org.ap

Re: Search by KeyWord, the best practice

2007-12-27 Thread Erick Erickson
As long as you control both ends (i.e. what's indexed and what's searched) then UN_TOKENIZED is fine. Note that case has to match, etc. As an added benefit, you can sort by the field too... Best Erick On Dec 27, 2007 10:31 AM, webspeak <[EMAIL PROTECTED]> wrote: > > Hello, > > Thank you for you

Re: Search by KeyWord, the best practice

2007-12-27 Thread webspeak
Hello, Thank you for your reply :-) The customer value will be choosed from a dropdown list.The value that it will be selected must match the value in the CUSTOMER field. I think I don't have to tokenized it... as it is exact match. Erick Erickson wrote: > > Well, it depends upon what you w

Re: Search by KeyWord, the best practice

2007-12-27 Thread Erick Erickson
Well, it depends upon what you want to accomplish. By indexing UN_TOKENIZED, the text is NOT broken up. So indexing "some text" will not match if you search on "some". or "text" or even "text some". You really, really, really need to tell us what it is you want to accomplish before anyone can sugg

Re: Search by KeyWord, the best practice

2007-12-27 Thread Grant Ingersoll
Depends on whether you want fuzzy matches on Customer or not. Assuming this value contains things like first and last name, I would think you would want to tokenize so that you can search for those separately. If it truly contains something that is a single token, then this should be fine

Re: IndexReader open problem?

2007-12-27 Thread Zhou Qi
Eric, Thanks for your advise. I implement a version count to sych the index and search. 2007/12/27, Erick Erickson <[EMAIL PROTECTED]>: > > If you search the mail archives, you'll find many discussions > of the fact that when you open an index reader, it takes a > snapshot of the index and subseq

Search by KeyWord, the best practice

2007-12-27 Thread webspeak
Hello, I would like to search documents by "CUSTOMER". So I search on the field "CUSTOMER" using a KeywordAnalyzer. The CUSTOMER field is indexed with those params: Field.Index.UN_TOKENIZED Field.Index.Store Is it the Best Practice ? -- View this message in context: http://www.nabble.com/Sea

Re: IndexReader open problem?

2007-12-27 Thread Erick Erickson
If you search the mail archives, you'll find many discussions of the fact that when you open an index reader, it takes a snapshot of the index and subsequent modifications to the index are not available until the searcher is closed and re-opened. It is NOT a good idea to open a new reader every ti

IndexReader open problem?

2007-12-27 Thread Zhou Qi
Hi all, I encounter a strange probelm. To improve performance, I open the indexreader at the start time and reuse it in later search. I have another process running to do online indexing. The search service and indexing service is accessing the same index folder. But I found out the search servi

Re: StopWords problem

2007-12-27 Thread Doron Cohen
Try printing all these after you close the writer: - ((FSDirectory) dir).getFile().getAbsolutePath() - dir.list().length (n) - dir.list()[0], .. , dir.list[n] This should at least help you verify that an index was created and where. Regards, Doron On Dec 27, 2007 12:26 PM, Liaqat Ali <[EMAIL PR

Re: StopWords problem

2007-12-27 Thread Liaqat Ali
Doron Cohen wrote: On Dec 27, 2007 11:49 AM, Liaqat Ali <[EMAIL PROTECTED]> wrote: I got your point. The program given does not give not any error during compilation and it is interpreted well. But the it does not create any index. when the StandardAnalyzer() is called without Stopwords list

Re: StopWords problem

2007-12-27 Thread Doron Cohen
On Dec 27, 2007 11:49 AM, Liaqat Ali <[EMAIL PROTECTED]> wrote: > I got your point. The program given does not give not any error during > compilation and it is interpreted well. But the it does not create any > index. when the StandardAnalyzer() is called without Stopwords list it > works well, b

Re: StopWords problem

2007-12-27 Thread Liaqat Ali
Doron Cohen wrote: This is not a self contained program - it is incomplete, and it depends on files on *your* disk... Still, can you show why you're saying it indexes stopwords? Can you print here few samples of IndexReader.terms().term()? BR, Doron On Dec 27, 2007 10:22 AM, Liaqat Ali <[EMAIL

Re: StopWords problem

2007-12-27 Thread Doron Cohen
This is not a self contained program - it is incomplete, and it depends on files on *your* disk... Still, can you show why you're saying it indexes stopwords? Can you print here few samples of IndexReader.terms().term()? BR, Doron On Dec 27, 2007 10:22 AM, Liaqat Ali <[EMAIL PROTECTED]> wrote:

Re: StopWords problem

2007-12-27 Thread N. Hira
Hi Liaqat, Are you sure that the Urdu characters are being correctly interpreted by the JVM even during the file I/O operation? I would expect Unicode characters to be encoded as multi-byte sequences and so, the string-matching operations would fail (if the literals are different from the

Re: StopWords problem

2007-12-27 Thread Liaqat Ali
Doron Cohen wrote: Hi Liagat, This part of the code seems correct and should work, so problem must be elsewhere. Can you post a short program that demonstrates the problem? You can start with something like this: Document doc = new Document(); doc.add(new Field("text",URDU_STOP_WOR

Re: StopWords problem

2007-12-27 Thread Doron Cohen
Hi Liagat, This part of the code seems correct and should work, so problem must be elsewhere. Can you post a short program that demonstrates the problem? You can start with something like this: Document doc = new Document(); doc.add(new Field("text",URDU_STOP_WORDS[0] +