One thing that maybe affect and usually i forget is that if your object has a unique identifier (client_no) such identifier must be present on the override of "equals" methods and be part of the generation of the hashCode, otherwise if you store this object in a collection and different routines access/updates such collection you will have unpredictable results.
On Fri, Apr 11, 2014 at 10:59 AM, Shai Erera <ser...@gmail.com> wrote: > Hi > > I am not sure how more than one client_no field ends up w/ a document, and > I'm not sure it's related to the taxonomy at all. > > However, looking at the code example you pasted above, and since you > mention that you index+commit in one thread, while another thread does the > reopen, I wonder if that's the issue: you first commit the taxo, then > commit the index. But what if a new document makes it into the index after > you committed to taxo, with a new client_no? In that case, the reopening > thread will discover an "older" taxonomy, while the index will have > categories with ordinals larger than the taxonomy's greatest ordinal? > > I also think that it's a mistake to commit and reopen in two separate > threads. If possible, I suggest that you do that always in the same thread, > and in that order: first commit the index, then the taxonomy. That way, if > a document goes in to the index (and new facets to the taxonomy) after the > index.commit(), then when you reopen the worse case is that the taxonomy is > "ahead" of the index, which is fine. When you reopen, also reopen in the > same order. > > Could you try that and see if that resolves your issue. Although, I don't > understand how this can lead to more than one client_no ending up in one > document, unless there's also a concurrency bug in the indexing code ... or > I misunderstood the issue. > > Shai > > > On Fri, Apr 11, 2014 at 2:49 PM, Rob Audenaerde <rob.audenae...@gmail.com > >wrote: > > > Hi all, > > > > I have a issue using the near real-time search in the taxonomy. I could > > really use some advise on how to debug/proceed this issue. > > > > The issue is as follows: > > > > I index 100k documents, with about 40 fields each. For each field, I also > > add a FacetField (issues arises both with FacetField as > > FloatAssociationFacetField). Each document has a unique number field > > (client_no). > > > > When just indexing and searching afterwards, all is fine. > > > > When searching while indexing, sometimes the number of facets associated > > with a document is to high, i.e. when collecting facets there are more > that > > one client_no on one document, which of course should not be the case. > > > > Before each search, I use the manager.maybeRefreshBlocking() before the > > search, because I want the most-actual results. > > > > I have a taxonomy and indexreader combined in a ReferenceManager (I > created > > this before the SearcherTaxonomyManager existed, but it behaves exactly > the > > same, similar refcount logic) > > > > During indexing I commit every 5000 documents (not needed for the NRT > > search, but needed to prevent loss in the application should shut down). > I > > commit as follows: > > > > public void commit() throws DocumentIndexException > > { > > try > > { > > synchronized ( GlobalIndexCommitAndCloseLock.LOCK ) > > { > > this.taxonomyWriter.commit(); > > this.luceneIndexWriter.commit(); > > } > > } > > catch ( final OutOfMemoryError | IOException e ) > > { > > tryCloseWritersOnOOME( this.luceneIndexWriter, > > this.taxonomyWriter ); > > throw new DocumentIndexException( e ); > > } > > } > > > > I use a standard IndexWriterConfig and both IndexWriter and > TaxonomyWriter > > are RAMDirectory(). > > > > My testcase indexes the 100k documents, while another thread is > > continuously calling 'manager.maybeRefreshBlocking()'. This is enough to > > sometimes cause the taxonomy to be incorrect. > > > > The number of indexing threads does not seems to influence the issue, as > it > > also appears when I have only 1 indexing thread. > > > > I know it is an index problem, because when I write in the index to file > > instead of RAM and reopen it in a clean application, I see the same > > behaviour. > > > > > > I could really use some advise on how to debug/proceed this issue. If > more > > info is needed, just ask. > > > > Thanks in advance, > > > > -Rob > > >