Re: consistent ordering of multi-values in a field

2009-07-07 Thread Michael McCandless
On Tue, Jul 7, 2009 at 3:49 PM, Chris Lu wrote: > Will the ordering of fields be preserved also? Alas, no. This used to be true (before 2.3), but 2.3 broke it (mea culpa -- sorry!), and we're now going to fix it again in 2.9. LUCENE-1727 is tracking this. So in 2.9 it will be true, but in 2.3.

Re: Alternative scoring of BooleanQuery

2009-07-07 Thread Chris Hostetter
: false). In my context this is not really what I want. I would prefer to have a : simple "maximum" function over the scores of the subqueries. Since I do not : consider myself an expert in the internal working of Lucene, is there an easy : way to achieve this or do I have to reimplement the whole

Re: Multi Value field

2009-07-07 Thread Mark Harwood
I just try norms idea as well no change You'll need to look at searcher.explain() for the two docs or post a Junit or code example that can be executed which shows the issue - To unsubscribe, e-mail: java-user-unsubscr...@l

Re: Multi Value field

2009-07-07 Thread John Seer
I already tried to use custom similarity (I set all methods to return 1f)- doesn't work. I just try norms idea as well no change markharw00d wrote: > >> if the term is "X Y" the document 2 is getting higher score then >> document 1. > > > That may be length normalisation at play. Doc 2 is

Re: consistent ordering of multi-values in a field

2009-07-07 Thread Chris Lu
That's great and thanks for the super fast answer! Another question if not thread-hijacking: Will the ordering of fields be preserved also? -- Chris Lu - Instant Scalable Full-Text Search On Any Database/Application site: http://www.dbsight.net demo: http://search.dbsight

Re: consistent ordering of multi-values in a field

2009-07-07 Thread Michael McCandless
Yes, order within the same field name should be preserved. Mike On Tue, Jul 7, 2009 at 3:36 PM, Chris Lu wrote: > Hi, > > When using org.apache.lucene.document.Document.getValues(fieldName), will it > be the same order that I added to the document? > > Suppose I add field1~value1, field1~value2,

consistent ordering of multi-values in a field

2009-07-07 Thread Chris Lu
Hi, When using org.apache.lucene.document.Document.getValues(fieldName), will it be the same order that I added to the document? Suppose I add field1~value1, field1~value2, field2~value3 to a document. Later, maybe after several rounds of merging, will I always get an array of {value1,value2

Re: Multi Value field

2009-07-07 Thread Mark Harwood
if the term is "X Y" the document 2 is getting higher score then document 1. That may be length normalisation at play. Doc 2 is shorter so may be seen as a better match for that reason. Using the "explain" function helps illustrate the break down of scores in matches. You could try index

Multi Value field

2009-07-07 Thread John Seer
Hello, I have 100k index with documents with one searchable field in it. That field has multiple values for example doc( search: X search : X Y search: X Y Z id:1) doc( search: X Y K id:2) I am using Standard Analyzer for building and searching, and having problem with scores if the term is "

Re: Optimizing unordered queries

2009-07-07 Thread Jason Rutherglen
Ah ok, I was thinking we'd wait for the new flex indexing patch. I had started working along these lines before and will take it on as a project (which is I believe reducing the memory consumption of the term dictionary). I plan to segue it into the tag index at some point. On Tue, Jul 7, 2009 at

Alternative scoring of BooleanQuery

2009-07-07 Thread Klaus Malorny
Hi all, sorry if this is FAQ or has been answered in the list earlier, but unfortunately I did not find a decent way to search in the archive (maybe a job for Lucene ;-) ) For some reason, I had to split my document into multiple fields. For the search, I create a query with two subqueries

Re: Boolean retrieval

2009-07-07 Thread Lukas Michelbacher
> Seems a long-winded way of producing a BooleanFilter but I guess you are > trying to work with user input in the form of query strings. Yes I am. I had the same impression but I couldn't figure out a more straightforward way. > The bug in your code is that clause.getQuery().getString() is not

Re: Boolean retrieval

2009-07-07 Thread Michael McCandless
On Tue, Jul 7, 2009 at 5:39 AM, mark harwood wrote: > Given the requirement is to ignore scoring I would recommend (as someone else > suggested) looking at the IndexSearch.search method that takes a HitCollector > and simply accumulate all results, regardless of score. Make that "Collector" (ne

Re: Optimizing unordered queries

2009-07-07 Thread Michael McCandless
OK good to hear you have a sane number of TermInfos now... I think many apps don't have nearly as many unique terms as you do; your approach (increase index divisor & LRU cache) sounds reasonable. It'll make warming more important. Please report back how it goes! Lucene is unfortunately rather w

Re: Boolean retrieval

2009-07-07 Thread mark harwood
Seems a long-winded way of producing a BooleanFilter but I guess you are trying to work with user input in the form of query strings. The bug in your code is that clause.getQuery().getString() is not producing terms that are in your index - the first call to getTermsFilter passes the string "

Re: Boolean retrieval

2009-07-07 Thread Lukas Michelbacher
To test my Boolean queries, I have a small test collection where each document contains one of 1024 possible combinations of the strings "aaa", "bbb", ... "jjj". I tried wrapping a Boolean query like this (it's based on an older post to this list [1]) private static TermsFilter getTermsFilter(St