Well, docIDs are used all over the place in the index. Sometimes they key into an index file "linearly", like for stored fields and term vectors index files, but other times they are encoded eg in the posting lists.

Mike

Cam Bazz wrote:

Yes, I have found. however it is not for reqular indexing. I am using lucene as a graph database. So I add edges. An edge is a document with srcId and dstId field. However I am using a caching scheme which employs a LRU a weakHashMap and a two link lists for keeping track of reads and writes. (It is not working yet, but I will tell you when it does)

About doc ids being transparent. are the doc ids point to places in diskspace?

Best Regards,
C.A.



On Jan 22, 2008 11:51 AM, Michael McCandless < [EMAIL PROTECTED]> wrote:

Woops, sorry, it's hiding in DocumentsWriter: the addDeleteDocID
method.

And, it's private.  So you will actually have to modify the Lucene
core sources (not just subclass IndexWriter) to experiment with
deleting by docID in IndexWriter.  If you find a clean way to safely
expose this functionality, please post back with your results!

Mike

Cam Bazz wrote:

I am looking at the IndexWriter source code - and I could not find a method (private) to delete by doc id.
Where is it hiding?

Best,
- C.B.

On Jan 19, 2008 1:07 PM, Michael McCandless < [EMAIL PROTECTED]> wrote:

Good question....

So far, this method has not been carried over to IndexWriter because
in general it's not really safe, since there's no way to get an
accurate docID from IndexWriter itself.

You can't really "know" when IndexWriter does merges that compacts
deletes and thus changes docIDs.  So, if you open a reader on the
side, get a docID you want to delete, and then go and ask IndexWriter
to delete that docID, you may in fact delete the wrong document.  In
2.3 , where segment merges are now done with a background thread, it's
even worse, because a merge could complete and be committed, thus
changing docIDs, at any time...

See complex discussion here:

    http://markmail.org/message/wxqel3gd6cmavk5a

As of 2.3, the low level infrastructure was added to IndexWriter for
deleting by document ID, but this is not exposed publicly (this was a
side effect of LUCENE-1112).  It's only used, internally, to delete a
document if an exception is hit while indexing it.  In theory, you
could then subclass IndexWriter and tap into this infrastructure to
delete by docID, but, you're entering dangerous territory!

Do you have a specific use case in mind here?  I think we'd like to
make this option available someday in IndexWriter, but doing so now
(when there is no way to get a "reliable" docID) seems too dangerous...

Mike

Cam Bazz wrote:

> Hello,
>
> How do I delete a specific document from an indexwriter? I
> understand there
> is deleteDocuments(term) which deletes all the documents matching
> the term.
> But what if I want to delete a document that has more then one term in
> specific. I can search the document with a boolean query, and then
> get the
> doc id.
> I know that doc ids are temporary, but can I not use it for delete?
>
> IndexReader has a delete by doc id method, but I am not sure how to
> use this
> when using an indexwriter.
>
> Best,
> C.B.


---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]





Reply via email to