Re: cross-field AND queries with field boosting

2009-01-28 Thread Muralidharan V
Karsten, Thanks for the suggestion. After some research, payloads and BoostingTermQuery is what we ended up using. Thanks, Murali On Wed, Jan 28, 2009 at 2:13 AM, Karsten F. wrote: > > Hi Murali, > > I think a search with 4 * 5 = 20 Boolean Clauses will not be a performance > problem >

Re: Group by in Lucene ?

2009-01-28 Thread Mark Miller
Group-by in Lucene/Solr has not been solved in a great general way yet to my knowledge. Ideally, we would want a solution that does not need to fit into memory. However, you need the value of the field for each document. to do the grouping As you are finding, this is not cheap to get. Currentl

Re: Group by in Lucene ?

2009-01-28 Thread Erick Erickson
At a quick glance, this line is really suspicious: Document document = this.indexReader.document(doc) >From the Javadoc for HitCollector.collect: Note: This is called in an inner search loop. For good search performance, implementations of this method should not call Searcher.doc(int)or IndexRea

Re: NullPointerException in FieldDocSortedHitQueue.lessThan with custom SortComparator

2009-01-28 Thread Erick Erickson
Well, just glancing at the code you have no assurance that cj != null. See below. On Wed, Jan 28, 2009 at 4:58 AM, ninaS wrote: > > Hello, > > I am using a custom SortComparator implementation where I need to override > a > method in order to handle Null values: > > @Override > public S

Re: Group by in Lucene ?

2009-01-28 Thread Marcus Herou
Oh bytw, faceting is easy it's the distinct part I think is hard. Example Lucene Facet: http://sujitpal.blogspot.com/2007/04/lucene-search-within-search-with.html On Wed, Jan 28, 2009 at 12:43 PM, Marcus Herou wrote: > Hi. > > This is way too slow I think since what you are explaining is somethi

Re: Group by in Lucene ?

2009-01-28 Thread Marcus Herou
Hi. This is way too slow I think since what you are explaining is something I already tested. However I might be using the HitCollector badly. Please prove me wrong. Supplying some code which I tested this with. It stores a hash of the value of the term in a TIntHashSet and just calculates the si

Re: cross-field AND queries with field boosting

2009-01-28 Thread Karsten F.
Hi Murali, I think a search with 4 * 5 = 20 Boolean Clauses will not be a performance problem (at least if you have only one optimized index-folder). You also could use one Field which contains content of all other fields with a boost factor for each term (different boost for content from diffe

NullPointerException in FieldDocSortedHitQueue.lessThan with custom SortComparator

2009-01-28 Thread ninaS
Hello, I am using a custom SortComparator implementation where I need to override a method in order to handle Null values: @Override public ScoreDocComparator newComparator (final IndexReader reader, final String fieldname) throws IOException { final Strin

Re: Group by in Lucene ?

2009-01-28 Thread ninaS
By the way: if you only need to count documents (count groups) HitCollector is a good choice. If you only count you don't need to sort anything. ninaS wrote: > > Hello, > > yes I tried HitCollector but I am not satisfied with it because you can > not use sorting with HitCollector unless you

Re: Group by in Lucene ?

2009-01-28 Thread ninaS
Hello, yes I tried HitCollector but I am not satisfied with it because you can not use sorting with HitCollector unless you implement a way to use TopFieldTocCollector. I did not manage to do that in a performant way. It is easier to first do a normal search und "group by" afterwards: Iterate