Thanks to all for the replies. I checked with luke and documents with
different index time boosting (not much different) has same fieldNorm. I
think that is causing the search hits to have same score.
As Andrzej suggested i checked the rounding error caused by encoding. The
result really surprises
On 2009-12-18 21:47, Tom Hill wrote:
The docBoost, IIRC, is stored in a single byte, which combines the doc
boost, the field boost, and the length norm.
(
http://lucene.apache.org/java/2_4_1/api/core/org/apache/lucene/search/Similarity.html#formula_norm
)
Are the lengths of your documents the sa
The docBoost, IIRC, is stored in a single byte, which combines the doc
boost, the field boost, and the length norm.
(
http://lucene.apache.org/java/2_4_1/api/core/org/apache/lucene/search/Similarity.html#formula_norm
)
Are the lengths of your documents the same? If not, this could be affecting
you
Just re-read my post and I don't think it was clear.
The algorithm I ended up doing is:
for all daily data
gather hourly ids in a set
build map with placeholders for hourly values
get hourly documents from set
for all daily data
insert hourly data from document
Hey Erick,
Ok that's what I thought. Unfortunately doing that will pretty much take
the same time as what I was doing (at least I think it will)
What I ended up doing is fetching them all with a big query. I obviously
needed to change the MaxClauseCount but in my case this is an acceptable
sol
The boost is stored in the index using a single byte, ie, heavily
quantized... that may explain what you are seeing?
If you make the boosts wildly different do you then see a score difference?
Mike
On Fri, Dec 18, 2009 at 12:40 PM, prabin meitei wrote:
> Hi,
> I have an index in which documen
Hi,
I have an index in which documents are inserted with different boost
during indexing.
eg.
doc1 has boost 5.64
doc2 has boost 5.25
doc3 has boost 5.10
doc4 has boost 4.8
doc5 has boost 4.4
doc6 has boost 4.0
and so on... some documents even having boost only 1.0
when i search the index for a
>From your original post...
<<>>
So it's just the IDs to your hourly index contained in each Daily document.
Problem here is that this ID is probably NOT the Lucene ID, so my original
idea needs some refinement. Assuming that your IDs to the hourly index
are contained in your daily document, you
Thanks for your answer,
I didn't think that using: Document doc irHourly.doc();
would be much faster than using the searcher. I will try that.
I have one question though: what is the you reffer to. Since I
have searched in the daily index, what is the corresponding hit on the
hourly i
Well, making a large OR clause is definitely more efficient than making
N different requests, but you would have to search the results. It doesn't
sound very performant.
Could you go to 50,000 ids? yes, but you have to fiddle with
setMaxClauseCount
because Lucene defaults to a max of 1,024.
There
Hello,
I have a performance problem and would need expert advice on how to go
about fixing it:
I currently have 2 indexes: Daily and Hourly. The Daily index contains
about 1,000,000 documents and my Hourly index approximately: 24,000,000
documents. My Daily index contains many fields and s
On Fri, 2009-12-18 at 12:47 +0100, Ganesh wrote:
> I am using Integer for datetime. As the data grows, I am hitting the
> upper limit.
Could you give us some numbers? Document count, index size in GB, amount
of RAM available for Lucene?
> One option most of them in the group discussed about usin
I'm not sure this specific detail (how IW uses Similarity) is
documented -- best "documentation" is the source code ;)
Have a look at oal.index.NormsWriterPerField. That's where the
default indexing chain asks Similarity to create the norm.
Mike
On Fri, Dec 18, 2009 at 5:12 AM, kdev wrote:
>
>
Thanks for all your ideas. I was expecting the sorting related fix in 3.0 but
hopefully it would be great, if it is get in to 3.1.
I am using Integer for datetime. As the data grows, I am hitting the upper
limit. As my application is part of the product, used in different environment,
We cannot
The avg is used only in the idf method of the Similarity class. So I guess
there is workaround for what I want to do. Can you give me a reference, on
lucene doc, on how a IndexWriter uses the provided Similarity class?
Thanks again for your time and your help.
Michael McCandless-2 wrote:
>
> I
15 matches
Mail list logo