Yes that all looks reasonable. Maybe there is a mismatch in the
analysis chain? I'm just throwing out wild guesses because I don't
really see any problems in what you shared. Also - if the problem
really has something to do with ControlledRealTimeReopenThread, I'm not
going to have the answer, so I apologize but I think I need to bow out.
-Mike
On 12/1/2014 6:22 PM, Badano Andrea wrote:
Thanks for your reply!
I try to delete documents using a term that matches a Document TextField:
private static final String NAME = "name";
private void store(String n, ... other fields ...) {
Document d = new Document();
d.add(new TextField(NAME, n, Field.Store.YES));
... add other fields ...
_iw.addDocument(d);
}
private void remove(String n) {
Term t = new Term(NAME, n);
_iw.deleteDocuments(t);
}
Is it possible to remove a document in this manner? Create a Term object based
on a document field of type TextField?
I never close() any of the documents created in my wrapper.
All add/update/deletes go via the TrackingIndexWriter, while all commits are
called on the underlying IndexWriter.
Regards,
Andrea
On 1 Dec 2014, at 23:23, Michael Sokolov <msoko...@safaribooksonline.com> wrote:
It's impossible to tell since you didn't include the code for it, but my advice
would be to look at how the documents are being marked for deletion. What are
the terms being used to delete them? Are you trying to use lucene docids?
-Mike
On 12/1/2014 4:22 PM, Badano Andrea wrote:
Hello,
My apologies for a longish question.
I am having some problems with a class that tries to ensure that a lucene index
is
always kept up-to-date with the contents of a mysql master database. Users add,
modify, and delete items in the master database, and all changes to the master
database are immediately propagated to the index. When the application starts
up,
all items present in the master database that are not present in the index are
added to the index. Similarly, all items present in the index that are not
present
in the master database are removed from the index.
I am trying to do this with code based on
http://stackoverflow.com/questions/17993960/lucene-4-4-0-new-controlledrealtimereopenthread-sample-usage.
Automatically copying data from the master database to the index seems to work.
However, removing items from the index not present in the database does not
seem to work.
So I have this class:
class IndexWrapper {
private final IndexWriter _iw;
private final TrackingIndexWriter _triw;
private final ReferenceManager<IndexSearcher> _rmgr;
private final ControlledRealTimeReopenThread<IndexSearcher> _reopen;
private final Analyzer _analyzer;
private AtomicLong _gen;
...
}
that is set up as follows:
_iw = new IndexWriter(directory, new IndexWriterConfig(Version.LUCENE_4_10_2,
analyzer));
_triw = new TrackingIndexWriter(_iw);
_rmgr = new SearcherManager(_iw, true, null);
_reopen = new ControlledRealTimeReopenThread<IndexSearcher>(_triw,_rmgr, 60.00,
0.1);
_analyzer = analyzer;
_gen = new AtomicLong(_triw.getGeneration());
_reopen.start();
First some code that fetches every doc in the index is called:
_reopen.waitForGeneration(_gen.get()); // wait until the index is re-opened for
the last update
IndexSearcher searcher = _rmgr.acquire();
try {
... fetch all documents in index ...
}
finally {
_rmgr.release(searcher);
}
This returns all docs in the index. Later on, there is an attempt to remove
some of these documents
(the ones that no longer exist in the master database):
long curr = _gene.get();
_gen.compareAndSet(curr, _triw.deleteDocuments(termToRemove));
_iw.commit();
This code runs without any exceptions being thrown, but it does not seem to
remove anything.
If I enable logging, I see things such as:
DW : anyChanges? numDocsInRam=0 deletes=false hasTickets:false
pendingChangesInFullFlush: false
Supposedly the printout
numDocsInRam=0
means that commit() has not found any documents to delete. Also, if I add some
extra logging to IndexWriter.deleteDocuments() like so:
public void deleteDocuments(Term... terms) throws IOException {
ensureOpen();
try {
boolean dt = docWriter.deleteTerms(terms);
System.err.printf("DELETING TERMS : %s\n", terms);
System.err.printf("DT : %s\n", dt);
if (dt) {
processEvents(true, false);
}
} catch (OutOfMemoryError oom) {
tragicEvent(oom, "deleteDocuments(Term..)");
}
}
I can see printouts :
DT : false
So, an IndexWriter is given to a ReferenceManager which is then used to create
an IndexSearcher
that returns a set of documents. Yet later, when an attempt is made to remove
some of these
documents, the IndexWriter (or rather, its docWriter), cannot find these
documents. Assuming
that the IndexWriter is somehow involved in the inital fetch of all documents,
I am confused how
the IndexWriter a short while later cannot find some of these documents that
have been marked
(by my application) for deletion. I am pretty sure that the Term objects that
are passed into
deleteDocuments() are compatible with the documents previously returned by the
IndexSearcher.
So have I misunderstood the role of the IndexWriter as some kind of central
gateway to all documents?
Andrea
---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-user-h...@lucene.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-user-h...@lucene.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-user-h...@lucene.apache.org