Re: document with different index time boost returns same score

2009-12-18 Thread prabin meitei
Thanks to all for the replies. I checked with luke and documents with different index time boosting (not much different) has same fieldNorm. I think that is causing the search hits to have same score. As Andrzej suggested i checked the rounding error caused by encoding. The result really surprises

Re: document with different index time boost returns same score

2009-12-18 Thread Andrzej Bialecki
On 2009-12-18 21:47, Tom Hill wrote: The docBoost, IIRC, is stored in a single byte, which combines the doc boost, the field boost, and the length norm. ( http://lucene.apache.org/java/2_4_1/api/core/org/apache/lucene/search/Similarity.html#formula_norm ) Are the lengths of your documents the sa

Re: document with different index time boost returns same score

2009-12-18 Thread Tom Hill
The docBoost, IIRC, is stored in a single byte, which combines the doc boost, the field boost, and the length norm. ( http://lucene.apache.org/java/2_4_1/api/core/org/apache/lucene/search/Similarity.html#formula_norm ) Are the lengths of your documents the same? If not, this could be affecting you

Re: Query joining 2 indexes

2009-12-18 Thread frer
Just re-read my post and I don't think it was clear. The algorithm I ended up doing is: for all daily data gather hourly ids in a set build map with placeholders for hourly values get hourly documents from set for all daily data insert hourly data from document

Re: Query joining 2 indexes

2009-12-18 Thread frer
Hey Erick, Ok that's what I thought. Unfortunately doing that will pretty much take the same time as what I was doing (at least I think it will) What I ended up doing is fetching them all with a big query. I obviously needed to change the MaxClauseCount but in my case this is an acceptable sol

Re: document with different index time boost returns same score

2009-12-18 Thread Michael McCandless
The boost is stored in the index using a single byte, ie, heavily quantized... that may explain what you are seeing? If you make the boosts wildly different do you then see a score difference? Mike On Fri, Dec 18, 2009 at 12:40 PM, prabin meitei wrote: > Hi, >   I have an index in which documen

document with different index time boost returns same score

2009-12-18 Thread prabin meitei
Hi, I have an index in which documents are inserted with different boost during indexing. eg. doc1 has boost 5.64 doc2 has boost 5.25 doc3 has boost 5.10 doc4 has boost 4.8 doc5 has boost 4.4 doc6 has boost 4.0 and so on... some documents even having boost only 1.0 when i search the index for a

Re: Query joining 2 indexes

2009-12-18 Thread Erick Erickson
>From your original post... <<>> So it's just the IDs to your hourly index contained in each Daily document. Problem here is that this ID is probably NOT the Lucene ID, so my original idea needs some refinement. Assuming that your IDs to the hourly index are contained in your daily document, you

Re: Query joining 2 indexes

2009-12-18 Thread frer
Thanks for your answer, I didn't think that using: Document doc irHourly.doc(); would be much faster than using the searcher. I will try that. I have one question though: what is the you reffer to. Since I have searched in the daily index, what is the corresponding hit on the hourly i

Re: Query joining 2 indexes

2009-12-18 Thread Erick Erickson
Well, making a large OR clause is definitely more efficient than making N different requests, but you would have to search the results. It doesn't sound very performant. Could you go to 50,000 ids? yes, but you have to fiddle with setMaxClauseCount because Lucene defaults to a max of 1,024. There

Query joining 2 indexes

2009-12-18 Thread François Eric
Hello, I have a performance problem and would need expert advice on how to go about fixing it: I currently have 2 indexes: Daily and Hourly. The Daily index contains about 1,000,000 documents and my Hourly index approximately: 24,000,000 documents. My Daily index contains many fields and s

Re: External sort

2009-12-18 Thread Toke Eskildsen
On Fri, 2009-12-18 at 12:47 +0100, Ganesh wrote: > I am using Integer for datetime. As the data grows, I am hitting the > upper limit. Could you give us some numbers? Document count, index size in GB, amount of RAM available for Lucene? > One option most of them in the group discussed about usin

Re: Scoring formula - Average number of terms in IDF

2009-12-18 Thread Michael McCandless
I'm not sure this specific detail (how IW uses Similarity) is documented -- best "documentation" is the source code ;) Have a look at oal.index.NormsWriterPerField. That's where the default indexing chain asks Similarity to create the norm. Mike On Fri, Dec 18, 2009 at 5:12 AM, kdev wrote: > >

Re: External sort

2009-12-18 Thread Ganesh
Thanks for all your ideas. I was expecting the sorting related fix in 3.0 but hopefully it would be great, if it is get in to 3.1. I am using Integer for datetime. As the data grows, I am hitting the upper limit. As my application is part of the product, used in different environment, We cannot

Re: Scoring formula - Average number of terms in IDF

2009-12-18 Thread kdev
The avg is used only in the idf method of the Similarity class. So I guess there is workaround for what I want to do. Can you give me a reference, on lucene doc, on how a IndexWriter uses the provided Similarity class? Thanks again for your time and your help. Michael McCandless-2 wrote: > > I