Re: Boolean retrieval

2009-07-13 Thread Koji Sekiguchi
tsuraan wrote: Make that "Collector" (new as of 2.9). HitCollector is the old (deprecated as of 2.9) way, which always pre-computed the score of each hit and passed the score to the collect method. Where can I find docs for 2.9? Do I just have to check out the lucene trunk and run javado

Re: Boolean retrieval

2009-07-13 Thread tsuraan
> Make that "Collector" (new as of 2.9). > > HitCollector is the old (deprecated as of 2.9) way, which always > pre-computed the score of each hit and passed the score to the collect > method. Where can I find docs for 2.9? Do I just have to check out the lucene trunk and run javadoc there?

Re: strange issues with IRISH

2009-07-13 Thread John Byrne
Hi, "suspect that [an] is still ignored as a stop word for some reason" Yes, "an" is still a stop word in English of course! (eg. 'an apple') Your custom analyzer should work; are you making sure to do both your indexing *and* your searching with the new analyzer? I think making a list of Ir

strange issues with IRISH

2009-07-13 Thread OBender
Hi All, I've came across very strange issue with Irish language. I have the following set of strings in Irish: ag an gcrosbhealach seo, Lean ar an mуrbhealach., Lean an bуthar seo., An bhfuil ... in am imeacht?, An ... sin an t-am ceart? And here is a search string: an Sear

Re: Use of Synonyms

2009-07-13 Thread liat oren
I have my own synonyms, which are differnt from the ones in wordNet. For every word, I have synonyms and the score - for how close the synonym to its word. I would like to 'elaborate' the query, to expand it so it will include also the synonyms of the words given in the query. Thanks 2009/7/13 Er

RE: OOM with 2.9

2009-07-13 Thread Uwe Schindler
DONE. > -Original Message- > From: Michael McCandless [mailto:luc...@mikemccandless.com] > Sent: Monday, July 13, 2009 11:53 AM > To: java-user@lucene.apache.org > Subject: Re: OOM with 2.9 > > Ahh good point. I agree it makes sense to make MMapDir's chunking > user-controllable. Can yo

Re: Use of Synonyms

2009-07-13 Thread Erick Erickson
What are you trying to do? I think you'd get a better response ifyou explained what higher-level task/feature you're trying to implement. Best Erick On Mon, Jul 13, 2009 at 4:54 AM, liat oren wrote: > Hi all, > > I have a list of synonyms for every word. > Is there a good way to use these synon

Re: [Bulk] RE: Exception at MultiSearcherThread.hits

2009-07-13 Thread Erick Erickson
Please don't hijack a thread, start a new topic. From Hossman: http://people.apache.org/~hossman/#threadhijack Thread Hijacking on Mailing Lists When starting a new discussion on a mailing list, please do not reply to an existing message, instead start a fresh email. Even if you change the subjec

Re: Modifying score based on tf and slop

2009-07-13 Thread Rads2029
Hi all, I modified the setFreqCurrentDoc method of SpanScorer as follows: ( Frequency is updated only for the shortest span ) int minMatchLenght=-1; do { int matchLength = spans.end() - spans.start(); if(minMatchLenght==-1)minMatchLenght=matchLength;

Re: speed of BooleanQueries on 2.9

2009-07-13 Thread eks dev
Hi Mike, getMaxNumOfCandidates() in test was 200, Index is optimised and read-only We found (due to an error in our warm-up code, funny) that only this Query runs slower on 2.9. A hint where to look could be that this Query cointains two, the most frequent tokens in two particular fields

Re: OOM with 2.9

2009-07-13 Thread eks dev
Hi Mike, thanks for looking into it... I am now positive, it was definitely a problem for OS to map() large continuous chunk of process memory... if I use this machine for a while as a desktop, eclipse,... I get the same problem again... but after cold restart, mapping succeeds. The proble

Re: Hindi, diacritics and search results

2009-07-13 Thread KK
Apart from using WhiteSpaceAnalyzer which will tokenize words based on spaces, you can try writing a simple custom analyzer which'll a bit more. I did the following for handling Indic languages intermingled with English content, /** * Analyzer for Indian language. */ public class IndicAnalyzerIn

Re: OOM with 2.9

2009-07-13 Thread Michael McCandless
Ahh good point. I agree it makes sense to make MMapDir's chunking user-controllable. Can you open an issue? Mike On Mon, Jul 13, 2009 at 5:49 AM, Uwe Schindler wrote: >> On Sun, Jul 12, 2009 at 10:51 AM, eks dev wrote: >> >> > MMapDirectory has support for chunking (Ineteger.MAX_VALUE) anyhow..

Re: speed of BooleanQueries on 2.9

2009-07-13 Thread Michael McCandless
This is not expected; 2.9 has had a number of changes that ought to reduce CPU cost of searching. If this holds up we definitely need to get to the root cause. Did your test exclude the warmup query for both 2.4.1 & 2.9? How many segments in the index? What is the actual value of getMaxNumOfCan

RE: OOM with 2.9

2009-07-13 Thread Uwe Schindler
> On Sun, Jul 12, 2009 at 10:51 AM, eks dev wrote: > > > MMapDirectory has support for chunking (Ineteger.MAX_VALUE) anyhow... > maybe for such cases this threshold can become user settable. I will try > to experiment with it  (I am talking about MMapDirectory -> private final > int MAX_BBUF = Int

Re: OOM with 2.9

2009-07-13 Thread Michael McCandless
On Sun, Jul 12, 2009 at 10:51 AM, eks dev wrote: > MMapDirectory has support for chunking (Ineteger.MAX_VALUE) anyhow... maybe > for such cases this threshold can become user settable. I will try to > experiment with it  (I am talking about MMapDirectory -> private final int > MAX_BBUF = Intege

Re: OOM with 2.9

2009-07-13 Thread Michael McCandless
I would not expect a 2.9 IndexReader to consume more RAM. Was this definitely the case? (It wasn't just a matter of other processes taking up RAM). If so, we should drill in to understand the root cause / regression. One thing you can do in 2.9 is IndexReader.setDisableFakeNorms(true), to preve

Use of Synonyms

2009-07-13 Thread liat oren
Hi all, I have a list of synonyms for every word. Is there a good way to use these synonyms? Currently I use a boost query so if 'a' is the queried word, and 'b' (0.5) and 'c' (0.2) are its synonyms, I query for: a^1 + b^0.5 + c^0.2. Is there a better way of doing it? Thanks, Liat

Re: [Bulk] RE: Exception at MultiSearcherThread.hits

2009-07-13 Thread henok sahilu
hello there i can search for "renew" but not for "renewal" when i index i used this code   doc.add(new Field("contents", text,Field.Store.NO,                     Field.Index.ANALYZED)); and my query was parsed    QueryParser parser = null;     File file=new File("StopWordList.txt");     parser

Re: [Bulk] RE: Exception at MultiSearcherThread.hits

2009-07-13 Thread Ganesh
The Exception message is null. When i restart my application, It is working fine. Regards Ganesh - Original Message - From: "Uwe Schindler" To: Sent: Monday, July 13, 2009 11:43 AM Subject: [Bulk] RE: Exception at MultiSearcherThread.hits > Can you please post the whole Exception,