Users can index really a lot of stuff, so I'd like not to keep things in memory for too long.

Even if I keep a set of things added, how do I know if something has been deleted via a delete? It seems rather difficult to keep this set of documents added in sync with the index reader on the index (before it has been written to).

What I'd like is to have an access to the stuff the index writer has written but not yet commited. Is there something that can access that data?

Daniel Shane

Shai Erera wrote:
How many documents do you index between you refresh a reader? If it's not
too much, I'd keep a Set of those terms and check every incoming document in
the set and then the reader.

Note that the set keeps only just the terms of those documents your reader
doesn't see. You should clear() it after you've refreshed your reader.

In 2.9, IndexWriter will expose a getReader(), so you might be able to use
it, by checking on its reader and the on disk reader.

If it's possible, I think I'd prefer the first approach.

Shai

On Thu, Aug 13, 2009 at 5:33 PM, Daniel Shane <sha...@lexum.umontreal.ca>wrote:

Hi all!

I'm currently running a big lucene index and one of my main concerns is the
integrity of the data entered. A few things come to mind, like enforcing
that certain fields be non-blank, forcing certain formats etc...

All these validations are easy to do with lucene, since I can validate the
document before it is indexed or when it is retrieved.

The thing however that I have a hard time with, is field uniquness.

Lets say I have a field and I really want it to be unique. I can't seem to
find out how to do it during the indexation phase since everything that is
added to the index is not readable by an index reader until the index is
closed.

Add to that the fact that items can be deleted from the index during the
indexation and the only way I have to figure uniquness is to check every
unique field values using termEnums and checking for docFreq.

This has a major disadvantage that I cannot inform people who are using the
library of the unique conflit when it happens, only when the index is
closed.

Does anyone have an idea on how I could check an index that is in the
process of being indexed (things added, things deleted) for the uniquess of
a given field *at the time I index a document* ?

Daniel Shane


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-user-h...@lucene.apache.org





---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-user-h...@lucene.apache.org

Reply via email to