On Thu, Apr 2, 2009 at 2:26 PM, John Wang wrote:
> Hi Michael:
> Thanks for looking into this.
>
> Approach 2 has a dependency on how fast the delete set performs a check
> on a given id, approach one doesn't. After replacing my delete set with a
> simple bitset, approach 2 gets a 25-30% imp
Hi Michael:
Thanks for looking into this.
Approach 2 has a dependency on how fast the delete set performs a check
on a given id, approach one doesn't. After replacing my delete set with a
simple bitset, approach 2 gets a 25-30% improvement.
I understand if the delete set is small, appr
On Wed, Apr 1, 2009 at 6:37 PM, John Wang wrote:
> a code snippet is worth 1000 words :)
Here here!
OK, now I understand the difference.
With approach 1, for each of N UIDs you use a TermDocs to find the
postings for that UID, and retrieve the one docID corresponding to
that UID. You retrieve
a code snippet is worth 1000 words :)
private static final Term UID_TERM = new Term("uid_payload", "_UID");
private static class SinglePayloadTokenStream extends TokenStream {
private Token token = new Token(UID_TERM.text(), 0, 0);
private byte[] buffer = new byte[4];
private boolean
On Wed, Apr 1, 2009 at 5:22 PM, John Wang wrote:
> Hi Michael:
>
> 1) Yes, we use TermDocs, exactly what IndexWriter.deleteDocuments(Term)
> is doing under the cover.
This part I understand :)
> 2) We iterate the docid->uid mapping, for each docid, get the
> corresponding ui and check that
Hi Michael:
1) Yes, we use TermDocs, exactly what IndexWriter.deleteDocuments(Term)
is doing under the cover.
2) We iterate the docid->uid mapping, for each docid, get the
corresponding ui and check that to see if that is in the deleted set. If so,
add the docid to the list. There is no ui
On Wed, Apr 1, 2009 at 2:04 PM, John Wang wrote:
> My test essentially this. I took out the reader.deleteDocuments call from
> both scenarios. I took a index of 5m docs. a batch of 1 randomly
> generated uids.
>
> Compared the following scenarios:
> 1)
> * open index reader
> * for each uid i
Thanks Michael for the info.
I do guarantee there are not modifications between when
"MySpecialIndexReader" is loaded and when I iterate and find the deleted
docids. I am, however, not aware that when IndexWriter is opened, docids
move. I thought only when docs are added and when it is committed.
John,
We looked at implementing delete by doc id for LUCENE-1516, however it
seemed to be something that if enough people wanted we could implement it at
as a later patch.
The implementation involves maintaining a genealogy of SegmentReaders within
IndexWriter so that deletes to a reader that has
> For me at lease, IndexWriter.deleteDocument(int) would be useful.
I completely agree: delete-by-docID in IndexWriter would be a great
feature. Long ago I became convinced of that.
Where this feature always gets stuck (search the lists -- it's gotten
stuck alot) is how to implement it? At any
On Wed, Apr 1, 2009 at 4:02 AM, Michael McCandless
wrote:
> I think this has the same problem as exposing delete by docID, ie, how
> would you produce that docIdSet?
Whoops, right. I was going by memory that there was a
get(IndexReader) type method there... but that's on Filter of course.
-Yon
Hi Michael:
Let me first share what I am doing w.r.t deleting by docid:
I have a customized index reader that stores a mapping of docid -> uid in
the payload (something Michael Bush and Ning Li suggested a while back) And
that mapping is loaded a IndexReader load time and is shared by searche
John,
I think this has the same problem as exposing delete by docID, ie, how
would you produce that docIdSet?
We could consider delete by Filter instead, since that exposes the
necessary getDocIdSet(IndexReader) method.
Or, with near real-time search, we could enhance it to allow deletions
via t
So do you think it is a good addition/change to the current api now?
-John
On Tue, Mar 31, 2009 at 2:18 PM, Yonik Seeley wrote:
> On Tue, Mar 31, 2009 at 4:58 PM, John Wang wrote:
> > I fail to see the difference of exposing the api to allow for a Query
> > instance to be passed in vs a DocIdSe
On Tue, Mar 31, 2009 at 4:58 PM, John Wang wrote:
> I fail to see the difference of exposing the api to allow for a Query
> instance to be passed in vs a DocIdSet.
I was commenting specifically on your idea to allow deletion by int[]
(docids) on the IndexWriter.
DocIdSet is a different issue - i
I fail to see the difference of exposing the api to allow for a Query
instance to be passed in vs a DocIdSet. In this specific case, Query is
essentially a factory to produce a DocIdSetIterator (or Scorer) Isn't it
what DocIdSet is?
Thanks
-John
On Tue, Mar 31, 2009 at 12:57 PM, Yonik Seeley
wrot
On Tue, Mar 31, 2009 at 3:41 PM, John Wang wrote:
> Also, can we expose IndexWriter.deleteDocuments(int[] docids)?
Exposing internal ids from the IndexWriter may not be a good idea
given that they are transient.
-Yonik
http://www.lucidimagination.com
--
Hi guys:
IndexWriter.deleteDocuments(Query query) api is not really making sense
to me. Wouldn't IndexWriter.deleteDocuments(DocIdSet set) be better? Since
we don't really care about scoring for this call.
Also, can we expose IndexWriter.deleteDocuments(int[] docids)? Using the
c
18 matches
Mail list logo