Re: Optimizing unordered queries

2009-07-06 Thread Nigel
On Mon, Jul 6, 2009 at 12:37 PM, Michael McCandless < luc...@mikemccandless.com> wrote: > On Mon, Jun 29, 2009 at 9:33 AM, Nigel wrote: > > > Ah, I was confused by the index divisor being 1 by default: I thought it > > meant that all terms were being loaded. I see now in SegmentTermEnum > that >

Re: Free software for language detection

2009-07-06 Thread Vinicius Carvalho
You can also check google's language API: I'm writing a blog entry on this, hope to post tomorrow: http://code.google.com/apis/ajaxlanguage/documentation/reference.html Here a snippet of it working: (Using Json Simple to decode: http://code.google.com/p/json-simple/) try { String s =

Re: Modifying score based on tf and slop

2009-07-06 Thread Radha Sreedharan
Thanks a lot Mark. Do Correct me if I am wrong. but what this means is tf does not really have the same meaning as it does in case of other queries. Also I think I understand better what hossman has told - in the sense that BC is there in two matching spans , which is why we get higher score - th

Re: Optimizing unordered queries

2009-07-06 Thread Michael McCandless
On Mon, Jun 29, 2009 at 9:33 AM, Nigel wrote: > Ah, I was confused by the index divisor being 1 by default: I thought it > meant that all terms were being loaded.  I see now in SegmentTermEnum that > the every-128th behavior is implemented at a lower level. > > But I'm even more confused about why

Re: Modifying score based on tf and slop

2009-07-06 Thread Mark Miller
tf() is used, just not with the term freq - the length of the matching Spans is used instead. The terms from nested Spans will still affect the score (you still get IDF), but term freq is substituted with matching Span length. Also, boosts of nested Spans are ignored - only the top level boos

Re: Modifying score based on tf and slop

2009-07-06 Thread Rads2029
Thanks , That helped clear quite a few things. A few questions though : 1) Regarding tf not making a difference : I do believe that override tf to return 1 makes a difference. When I did not override tf the score on doc(AB BC BC CD) was higher on doc ( AB BC CD) When I did not override tf the s