merge will also change docid all segments' docId begin with 0 2011/3/30 Trejkaz <trej...@trypticon.org>: > On Tue, Mar 29, 2011 at 11:21 PM, Erick Erickson > <erickerick...@gmail.com> wrote: >> I'm always skeptical of storing the doc IDs since they can >> change out from underneath you (just delete even a single >> document and optimize). > > We never delete documents. Even when a feature request came in to > update documents (i.e. delete the old one and add a new version), we > ended up keeping the old version around, partially because we didn't > want the IDs to shift (which is a bit of a recursive argument), but > also because it's forensically sound to have the previous versions > around so people can see what edits were made. > >> What is it you're doing with the doc ID that you couldn't do with the guid? >> If your "guid list" >> were ordered, I can imagine building filters quite quickly from >> it using TermDocs.skipTo for instance.. > > The main problem with filters is that DocIdBitSet's iterator has to > return the doc IDs in order. > > Even if our GUIDs are in order (they would be, as it would be the > primary key on tables using them), they won't be in the same order as > the IDs of the docs they came from. So for each row in the ResultSet, > you need to do a TermDocs.seek(Term). This not only costs the > additional I/O (and it's a lot more than the original database query > was), but you have to read every row in the ResultSet just to get the > first doc ID. > > Contrast this with using doc IDs for the database query. You don't > need to hit the index at all since you already have the result. And > the docs come back in order, so you don't even have to iterate the > entire result set - you can read the first 100 rows and then read more > rows if/when they are needed. And if the caller is using skipTo then > this can be incorporated into the database query to avoid returning > rows which are only going to be discarded anyway. > > Integer fields should have improved things a little in terms of the > amount of I/O required to do the query (at least I would hope that > this is the case - I haven't done any tests yet and we can't use them > yet for backwards compatibility reasons) but they don't remove the > problem of needing to iterate every document in the result set > up-front. > > TX > > --------------------------------------------------------------------- > To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org > For additional commands, e-mail: java-user-h...@lucene.apache.org > >
--------------------------------------------------------------------- To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org For additional commands, e-mail: java-user-h...@lucene.apache.org