On Thu, Jul 26, 2012 at 5:38 AM, Simon Willnauer <simon.willna...@gmail.com> wrote: > you really shouldn't do that! If you use lucene as a Primary key > generator why don't you build your own on top. Just add one layer that > accepts the document and returns the PID and internally put it in an > ID field. Using no merge policy is not a good idea either you will > very likely reach system boarders (# file descriptors) and suffer from > bad search performance and low compression. > > I think you should really consider fixing your app instead of hacking lucene.
I can understand how they would end up in this situation since we ended up in it as well. We tried using our own ID (which we still have in Lucene and still use for other purposes), and it slows down some things. For example, when building bit sets for filters based on the external database, now you have to look up every ID you get back. Because you don't know if the last row returned from the query might be Lucene doc ID 0, you can't build the filter at all unless you process every row returned from the query. If you had a million docs returned by the SQL query, you had to do a million term lookups in Lucene. We didn't have enough memory to store the mapping from our ID back to Lucene's (OOME as soon as you tried to make a map to look things up faster), which made it impossible to cache the information at the time. I'm not sure if it's getting easier or harder - memory sizes are increasing but the number of docs people are putting into the indexes is increasing as well. At the time, Lucene developers were adamant that we shouldn't be using the doc ID because deleted doc IDs eventually get reused (or rather all the IDs shifted downwards) but since we never physically delete doc IDs (we want a history of item modification including deletion, so doing that would be undesirable anyway) it was never a problem until the new merging came along. I guess while the doc ID is still available, people will continue to use it. If it disappeared from the API completely, this would be good encouragement to migrate to a different approach. :) TX --------------------------------------------------------------------- To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org For additional commands, e-mail: java-user-h...@lucene.apache.org