Re: NGraming document for similar documents matching

2012-01-26 Thread Trejkaz
On Fri, Jan 27, 2012 at 10:41 AM, Saurabh Gokhale wrote: > I wanted to check if Ngraming the document contents (space is not the > issue) would make any good for better matching? Currently I see Ngram is > mostly use for auto complete or spell checker but is this useful for > similarity search? I

Re: BlockJoinQuery in text queries

2012-01-26 Thread Michael McCandless
I don't think there is one yet... it's [still] one of the limitations I listed here: http://blog.mikemccandless.com/2012/01/searching-relational-content-with.html But... if there were one, I don't think it would be user controllable. I think it's more of an up-front schema thing, eg you'd tell

Distributed index: Infinispan Directory or GlusterFS?

2012-01-26 Thread Francisco A. Lozano
Hi, I am going to face very soon the need of having a big number of small indexes directly accessible for R/W from N machines. I am evaluating Infinispan Lucene Directory implementation. So far I haven't found any problem and performance looks good, but looks scary because I don't see too many re

RE: Query term counting, again...

2012-01-26 Thread David Olson
Thanks Mike - I spent a few hours tracing through the explain process last night and could see all that and it looked like most was reachable without having to alter core classes. The other thing I thought of since I'm doing this as a one-time shot as messages come in (persisting aggregate counts)

RE: Query term counting, again...

2012-01-26 Thread Uwe Schindler
You have to take care that BooleanScorer2 is used, by requesting docsInOrder. Then its very nice, I have a customer using this. The important thing is that your Collector returns the right thing :-) Uwe - Uwe Schindler H.-H.-Meier-Allee 63, D-28213 Bremen http://www.thetaphi.de eMail: u...@th

Re: Query term counting, again...

2012-01-26 Thread Michael McCandless
You should be able to use the Scorer.visitSubScorers API? You'd do this up front, to recursively gather all "interesting" scorers in the Query, and then in a custom collector, in the collect method, you can go and ask each subScorer whether it matched the current document (call its .freq() and see

Re: Using dismax features in Lucene

2012-01-26 Thread Paul Taylor
On 10/01/2012 18:16, Chris Hostetter wrote: : The book said that dismax query was similar but different to : : DisjunctionMaxQuery the dismax *parser* in Solr is relatively simple, the majority of the code in it relates to parsing config options, reporting debugging, etc... if you wanted to do