Thanks Mike, Joins could be slower than docID based approach, no?
It would be great if lucene can incorporate an external docID after weighing the pros & cons. Many like us will be willing to trade-off search latency to some extent, in return for the low hanging fruits --- Ravi On Fri, Nov 2, 2012 at 9:06 PM, Michael McCandless < luc...@mikemccandless.com> wrote: > I suspect app-controlled docID will be a challenge, but I haven't > thought it through much. > > One possible solution might be to use joins? Either index time or > query time.... > > Ie, make a document that has the big text field that never change, and > a separate document that has all the little fields that frequently > change, joined by a common field. > > Then you can freely update the little fields without changing the big > field. > > Mike McCandless > > http://blog.mikemccandless.com > > On Thu, Oct 25, 2012 at 6:10 AM, Ravikumar Govindarajan > <ravikumar.govindara...@gmail.com> wrote: > > We have the need to re-index some fields in our application frequently. > > > > Our typical document consists of > > > > a) Many single-valued {long/int} re-indexable fields > > b) Few large-valued {text/string} static fields > > > > We have to re-index an entire document if a single smallish field changes > > and it is turning out to be a problem for us. I have gone through the > > https://issues.apache.org/jira/browse/LUCENE-3837 proposal where it > tries > > to work-around this limitation using a secondary mapping of new-old > docids. > > > > As I understand, lucene strictly maintains internal doc-id order so that > > many queries that depend on it, will work correctly. Segment merges will > > also maintain order as well as reclaim deleted doc-ids > > > > There should be many applications like us, which manage index shards > > limiting a given shard based on doc-id limits or size. So reclaiming > > deleted doc-ids is mostly a non-issue for us. > > > > That leaves us with changing doc-ids. How about leaving open the doc-ids > > themselves to the applications, at-least as an option to the needy? > Taking > > such an approach might inter-leave doc-ids across segments, but within a > > segment, the docIds are always in increasing order. There are > possibilities > > of ghost-deletes, duplicate docIds etc..., but all should be solvable, I > > believe. > > > > Fronting these doc-ids during search from all segment readers and > returning > > the correct value from one of them should be easy. Will it incur a heavy > > penalty during search? Another advantage gained, is the triviality of > > cross-joining indexes when docIDs are fixed. > > > > There must be many other places where an app supplied docId might make > > lucene behave funny. Need some help in identifying those areas at least > for > > understanding this problem correctly, if not solving it all together. > > > > -- > > Ravi > > --------------------------------------------------------------------- > To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org > For additional commands, e-mail: java-user-h...@lucene.apache.org > >