Re: Document-Ids and Merges

2012-04-05 Thread Christoph Kaser
Thank you both Mike and Shai for your answers. If anyone has a similiar problem: I ended up using a column that provides my own "document ids", whose values I got using the fieldcache. I then precalculate the indirection per IndexReader and store it in a WeakHashMap to save the extra lookup.

Re: Document-Ids and Merges

2012-03-28 Thread Michael McCandless
On Wed, Mar 28, 2012 at 3:37 AM, Christoph Kaser wrote: > Thank you for your answer! > > That's too bad. I thought of using my own ID-field, but I wanted to save the > additional indirection (from docId to my ID to my value). > Do document IDs remain constant for one IndexReader as long as it isn'

Re: Document-Ids and Merges

2012-03-28 Thread Shai Erera
Hi If you are working with trunk, then I believe that ValUes is what you're looking for. They allow you to store values at the document level, and then read then during search either from disk or RAM. They are also segment based. I'm not sure how ValueSource is used (I've never used it myself and

Re: Document-Ids and Merges

2012-03-28 Thread Christoph Kaser
Thank you for your answer! That's too bad. I thought of using my own ID-field, but I wanted to save the additional indirection (from docId to my ID to my value). Do document IDs remain constant for one IndexReader as long as it isn't reopened? If so, I could precalculate the indirection. Best

Re: Document-Ids and Merges

2012-03-28 Thread Christoph Kaser
Hi Shai, That sounds interesting. However, I am unsure how I can do this. Is there a way to store values "with a segment"? How can I get the segment from a document ID? Here is how my ValueSource looks like at the moment: public class MyScoreValues extends ValueSource { float[] values=...

Re: Document-Ids and Merges

2012-03-27 Thread Shai Erera
Or ... move to use a per-segment array. Then you don't need to rely on doc IDs changing. You will need to build the array from the documents that are in that segment only. It's like FieldCache in a way. The array is relevant as long as the segment exists (i.e. not merged away). Hope this helps.

Re: Document-Ids and Merges

2012-03-27 Thread Michael McCandless
In general how Lucene assigns docIDs is a volatile implementation detail: it's free to change from release to release. Eg, the default merge policy (TieredMergePolicy) merges out-of-order segments. Another eg: at one point, IndexSearcher re-ordered the segments on init. Another: because Concurre

Document-Ids and Merges

2012-03-27 Thread Christoph Kaser
Hi all, I have a search application with 16 million documents that uses custom scores per document using a ValueSource. These values are updated a lot (and sometimes all at once), so I can't really write them into the index for performance reasons. Instead, I simply have a huge array of float