Re: delete by doc id

Cam Bazz Tue, 12 Aug 2008 03:11:24 -0700

Hello Andy,

Thanks for your input. I understand what you are saying and trying to use
lucene as a relational db is a little too far, however,
in certain specialized areas, lucene works better than relational databases.
If you can setup the scheme, so that it is non-normalized, and if you dont
need too much updates, it really works nice.

I have read this paper: http://db.cs.yale.edu/vldb07hstore.pdf , in which
the authors claim 2 orders of magnitude speed increase (between 80 and 100
to be exact) by using specialized databases - (lucene is not mentioned)

I am using lucene to store graphs. So far I have tried many different
databases, ranging from mysql,oracle,postgres,objectivity/db,db4o and many
other. oracle and mysql start crying after 10M records. object relational
db's are good when it comes to traversal, but locating things are hard, and
if the data is coming in random distributions ingest rates are terribly
slow.

so far, lucene has been very fast. both for ingest, searching. its only
drawback is delete and update. (which I understand why)

I have been bugging lucene developers with similar questions for the past
year. With each release lucene is getting better and better.
And by adding deleteByQuery feature, i think it will be complete. (will
never have to depend on docid's again)

To solve my problem I have added a third field, which is not stored and
combination of the other two fields. Works nicely, but some overhead I
guess.

Is there any way that the developers can integrate the deleteByQuery into
2.3 release? It would be greatly appreciated, and I believe many other
people are waiting for the same feature.

Best Regards,
-C.B.

On Sat, Aug 9, 2008 at 1:55 AM, Andy Triana <[EMAIL PROTECTED]> wrote:

> I rarely submit but I've been seeing this sort of thing more and more
> on this board.
>
> It seems that there is a need to treat Lucene as if it were a data
> storage or database like repository, where in
> fact it isn't.
>
> In our case, for large indexes we run either a parallel process to
> create the index or a nightly process. Always treating
> the index as a disposable lookup mechanism.
>
> When it comes to deleting items from an index, we simply just mark
> those items in a separate list or data structure and then filter them
> away until the index is refreshed the next go around.
>
> My point is, that it seems like folks want Lucene to act like a
> relational database when in fact it is not meant to be used that way.
>
> Perhaps I'm wrong and upgrades will allow for efficient deletions, but
> it has always been clear to me that Lucene is strictly
> an index and should be treated as a feature of your storage
> repository, i.e. database, file system, web, whatever. But not relied
> upon for that very storage or management of that storage.
>
> The deletion issue is true of almost any indexing engine I've used in
> the past, i.e. DT Search, Verity, etc.
>
> Am I missing something? For us it has been a "best practice" to treat
> Lucene as described.
>
> //andy
>
>
> On Fri, Aug 8, 2008 at 2:39 PM, Cam Bazz <[EMAIL PROTECTED]> wrote:
> > hello,
> >
> > what would happen if I modified the class IndexWriter, and made the
> delete
> > by id method public?
> >
> > I have two fields in my documents and I got to be able to delete by those
> > two fields, (by query in other words) and I do not wish to go trunk
> version.
> >
> > I am getting quite desperate, and if not found a solution I will have to
> > make my documents with 3 fields, a, b and a + b so I can delete by a and
> b.
> >
> > Best.
> >
> > could there be a side effect?
> >
> > Best.
> >
> > -c.b.
> >
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: [EMAIL PROTECTED]
> For additional commands, e-mail: [EMAIL PROTECTED]
>
>

Re: delete by doc id

Reply via email to