TextField is dangerous: it is analyzed, possible into more then one token, and then your deletes won't work. It's safer to use StringField for tokens you later want to delete by.
Try making a standalone test that just deletes documents first... You don't need to iw.commit to make commits visible: the next reader refresh after deletes were done will reflect them. Mike McCandless http://blog.mikemccandless.com On Mon, Dec 1, 2014 at 6:44 PM, Michael Sokolov <msoko...@safaribooksonline.com> wrote: > Yes that all looks reasonable. Maybe there is a mismatch in the analysis > chain? I'm just throwing out wild guesses because I don't really see any > problems in what you shared. Also - if the problem really has something to > do with ControlledRealTimeReopenThread, I'm not going to have the answer, so > I apologize but I think I need to bow out. > > > -Mike > > > On 12/1/2014 6:22 PM, Badano Andrea wrote: >> >> Thanks for your reply! >> >> I try to delete documents using a term that matches a Document TextField: >> >> private static final String NAME = "name"; >> >> private void store(String n, ... other fields ...) { >> Document d = new Document(); >> d.add(new TextField(NAME, n, Field.Store.YES)); >> ... add other fields ... >> _iw.addDocument(d); >> } >> >> private void remove(String n) { >> Term t = new Term(NAME, n); >> _iw.deleteDocuments(t); >> } >> >> Is it possible to remove a document in this manner? Create a Term object >> based on a document field of type TextField? >> >> I never close() any of the documents created in my wrapper. >> All add/update/deletes go via the TrackingIndexWriter, while all commits >> are called on the underlying IndexWriter. >> >> Regards, >> >> Andrea >> >> >> >> >> >> >> On 1 Dec 2014, at 23:23, Michael Sokolov <msoko...@safaribooksonline.com> >> wrote: >> >> It's impossible to tell since you didn't include the code for it, but my >> advice would be to look at how the documents are being marked for deletion. >> What are the terms being used to delete them? Are you trying to use lucene >> docids? >> >> -Mike >> >> On 12/1/2014 4:22 PM, Badano Andrea wrote: >>> >>> Hello, >>> >>> My apologies for a longish question. >>> >>> I am having some problems with a class that tries to ensure that a lucene >>> index is >>> always kept up-to-date with the contents of a mysql master database. >>> Users add, >>> modify, and delete items in the master database, and all changes to the >>> master >>> database are immediately propagated to the index. When the application >>> starts up, >>> all items present in the master database that are not present in the >>> index are >>> added to the index. Similarly, all items present in the index that are >>> not present >>> in the master database are removed from the index. >>> >>> I am trying to do this with code based on >>> http://stackoverflow.com/questions/17993960/lucene-4-4-0-new-controlledrealtimereopenthread-sample-usage. >>> Automatically copying data from the master database to the index seems to >>> work. >>> However, removing items from the index not present in the database does >>> not seem to work. >>> >>> So I have this class: >>> >>> class IndexWrapper { >>> private final IndexWriter _iw; >>> private final TrackingIndexWriter _triw; >>> private final ReferenceManager<IndexSearcher> _rmgr; >>> private final ControlledRealTimeReopenThread<IndexSearcher> _reopen; >>> private final Analyzer _analyzer; >>> private AtomicLong _gen; >>> ... >>> } >>> >>> that is set up as follows: >>> >>> _iw = new IndexWriter(directory, new >>> IndexWriterConfig(Version.LUCENE_4_10_2, analyzer)); >>> _triw = new TrackingIndexWriter(_iw); >>> _rmgr = new SearcherManager(_iw, true, null); >>> _reopen = new ControlledRealTimeReopenThread<IndexSearcher>(_triw,_rmgr, >>> 60.00, 0.1); >>> _analyzer = analyzer; >>> _gen = new AtomicLong(_triw.getGeneration()); >>> _reopen.start(); >>> >>> First some code that fetches every doc in the index is called: >>> >>> _reopen.waitForGeneration(_gen.get()); // wait until the index is >>> re-opened for the last update >>> IndexSearcher searcher = _rmgr.acquire(); >>> try { >>> ... fetch all documents in index ... >>> } >>> finally { >>> _rmgr.release(searcher); >>> } >>> >>> This returns all docs in the index. Later on, there is an attempt to >>> remove some of these documents >>> (the ones that no longer exist in the master database): >>> >>> long curr = _gene.get(); >>> _gen.compareAndSet(curr, _triw.deleteDocuments(termToRemove)); >>> _iw.commit(); >>> >>> This code runs without any exceptions being thrown, but it does not seem >>> to remove anything. >>> If I enable logging, I see things such as: >>> >>> DW : anyChanges? numDocsInRam=0 deletes=false hasTickets:false >>> pendingChangesInFullFlush: false >>> >>> Supposedly the printout >>> >>> numDocsInRam=0 >>> >>> means that commit() has not found any documents to delete. Also, if I add >>> some extra logging to IndexWriter.deleteDocuments() like so: >>> >>> public void deleteDocuments(Term... terms) throws IOException { >>> ensureOpen(); >>> try { >>> boolean dt = docWriter.deleteTerms(terms); >>> System.err.printf("DELETING TERMS : %s\n", terms); >>> System.err.printf("DT : %s\n", dt); >>> if (dt) { >>> processEvents(true, false); >>> } >>> } catch (OutOfMemoryError oom) { >>> tragicEvent(oom, "deleteDocuments(Term..)"); >>> } >>> } >>> >>> I can see printouts : >>> >>> DT : false >>> >>> So, an IndexWriter is given to a ReferenceManager which is then used to >>> create an IndexSearcher >>> that returns a set of documents. Yet later, when an attempt is made to >>> remove some of these >>> documents, the IndexWriter (or rather, its docWriter), cannot find these >>> documents. Assuming >>> that the IndexWriter is somehow involved in the inital fetch of all >>> documents, I am confused how >>> the IndexWriter a short while later cannot find some of these documents >>> that have been marked >>> (by my application) for deletion. I am pretty sure that the Term objects >>> that are passed into >>> deleteDocuments() are compatible with the documents previously returned >>> by the IndexSearcher. >>> So have I misunderstood the role of the IndexWriter as some kind of >>> central gateway to all documents? >>> >>> Andrea >>> >> >> --------------------------------------------------------------------- >> To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org >> For additional commands, e-mail: java-user-h...@lucene.apache.org >> >> >> >> --------------------------------------------------------------------- >> To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org >> For additional commands, e-mail: java-user-h...@lucene.apache.org >> > > > --------------------------------------------------------------------- > To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org > For additional commands, e-mail: java-user-h...@lucene.apache.org > --------------------------------------------------------------------- To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org For additional commands, e-mail: java-user-h...@lucene.apache.org