Re: updateDocument (somtimes) no longer deleting documents after Update to 4.6

nospam Mon, 24 Feb 2014 04:55:22 -0800

I'll see if I can dig a little bit deeper into the 3.6 behavior, fornow I'm trying to get it running on 4.6 (as the index file is also a lotsmaller - on 3.6 it was about 2 GB for about 9000 documents, with 4.6it's only about 200 MB).

And yes the business ID is indexed - otherwhise I wouldn't be able tofind it at all - The problem is not that I can't find it but I find ittwice. And to make matters worse not consistently all the bime but onlysometimes. Somehow it looks like the delete (before the update) doessometimes work and sometimes not. Do you know any chances why this couldhappen? Maybe something related to the MergePoliy (which we don't sete.g. we are using the default)


Best Regards

Kai


Am 2014-02-24 12:10, schrieb Michael McCandless:

The 30 second turnaround time in 3.6.x is absurd; if you turn on
IndexWriter's infoStream maybe it'd give a clue.  Or, capture a few
stack traces and post them.

How are you creating the luceneDocumentToIndex?  You must ensure that
the business ID is in fact indexed as a field in the document,
otherwise the update won't find it.


Mike McCandless

http://blog.mikemccandless.com


On Mon, Feb 24, 2014 at 5:33 AM,  <nos...@kaigrabfelder.de> wrote:
Hi there,
we recently updated our application from lucene 3.0 to 3.6 with theeffect
that (albeit using the SearchManager functionality as described on

http://blog.mikemccandless.com/2011/09/lucenes-searchermanager-simplifies.html)
calls to searcherManager.maybeRefresh() were incredibly slow. e.g.takingabout 30 seconds after adding one document to the index with anindex of
about 9000 documents. I assumed that we did something wrong with the
configuration as 30 seconds could not be meant with NRT ;-)
Thus we migrated to the latest 4.6 version and indexing speed wasindeedvery good now (with the searcherManager.maybeRefreshBlocking() callonlytaking milliseconds to complete). But after some wore testing wediscovered
that somehow the indexWriter.updateDocument( term, documentToIndex )
functionality wasn't working anymore as expected - at leastsomtetimes. Itlooks like either the updateDocument method does not longer reliablydeletethe old document before adding a new one - with the result thatolder
documents are beeing returned by searches breaking our application.
Unfortunately I'm not able to reproduce the issues in a simple unittest butmaybe somebody of the lucene experts knows what we are doing wronghere. Notsure if it is of any relevance but we are running on Windows with a64 bit
JDK 7 thus MMapDirectory is beeing used.

Our Index Writer is configured like this:
IndexWriterConfig conf = new IndexWriterConfig(Version.LUCENE_46,new LimitTokenCountAnalyzer( new DefaultAnalyzer(),Integer.MAX_VALUE ) );
        conf.setOpenMode( OpenMode.APPEND );
IndexWriter indexWriter = new IndexWriter( FSDirectory.open(new
File( directoryPath )), conf );

SearcherManager is configured like this:
searcherManager = new SearcherManager(indexWriter, true,null);
// The anlyzer that we are using looks like this:

        public class DefaultAnalyzer extends Analyzer
        {
           @Override
protected TokenStreamComponents createComponents(finalString
fieldName,
                   final Reader reader) {
                 return new TokenStreamComponents(new
WhitespaceTokenizer(LuceneSearchService.LUCENE_VERSION, reader));
           }
        }

The update of the index looks like this:

        // instead of 42 the unique business identifier is used
        Long myUniqueBusinessId = 42l;
        BytesRef ref = new BytesRef(NumericUtils.BUF_SIZE_LONG);
NumericUtils.longToPrefixCoded(myUniqueBusinessId.longValue(), 0,
ref );
        Term term = new Term( "MY_UNIQUE_BUSINESS_ID", ref );
// this method may be called multiple times with the sameterm and
luceneDocumentToIndex parameter
        indexWriter.updateDocument( term, luceneDocumentToIndex);

        // After performing a couple of updates we execute
        searcherManager.maybeRefreshBlocking();


// For searching we are using the following code
        searcher = searcherManager.acquire();
// luceneQuery is the query, filter is some sort offiltering that
we apply, luceneSort is some sorting query
TopDocs topDocs = searcher.search( luceneQuery, filter,1000,
luceneSort );
// If we perform a query for MY_UNIQUE_BUSINESS_ID it will returnmultipleresults instead of just one - this was neither the case with lucene3.0 nor
3.6
In order to fix the issue I tried couple of things but to now avail.It
still happens (not all the time though) that the lucene returns two
documents when querying for MY_UNIQUE_BUSINESS_ID instead of justone
-       setting setMaxBufferedDeleteTerms to 1 in the config
        conf.setMaxBufferedDeleteTerms( 1 );
- explicetly deleting instead of just updating
        indexWriter.deleteDocuments( term );
- ensuring that the field MY_UNIQUE_BUSINESS_ID is stored in theindex and
not just analysed
- trying to delete the document via indexWriter.tryDeleteDocument()
- calling indexWriter.maybeMerge() after the update
- calling indexWriter.commit() after the update
Sorry for the lenghty post but I wanted to include as muchinformation as
possible. Let me know if something is missing...

Thanks for helping in advance ;-)

Kai


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-user-h...@lucene.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-user-h...@lucene.apache.org





---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-user-h...@lucene.apache.org

Re: updateDocument (somtimes) no longer deleting documents after Update to 4.6

Reply via email to