You can't rely on how IndexWriter will iterate/consume those fields; that's an implementation detail.
Maybe you could use CachingTokenFilter to pre-process the text fields and append the new fields? And then during indexing, replay the cached tokens, so you don't have to tokenize twice. Mike McCandless http://blog.mikemccandless.com On Tue, Mar 11, 2014 at 2:33 PM, Stephen Green <eelstretch...@gmail.com> wrote: > I'm working on a system that uses Lucene 4.6.0, and I have a couple of use > cases for documents that modify themselves as they're being indexed. > > For example, we have text classifiers that we would like to run on the > contents of certain fields. These classifiers produce field values (i.e., > the classes that the document is in) that I would like to be part of the > document. > > Now, the text classifiers want to tokenize the text in order to do the > classification, and I'd like to avoid re-tokenizing the text multiple > times, so I can build a token filter that collects the tokens and then runs > the classifier. This filter can know about the oald.Document that's being > processed, but I suspected that adding elements to Document.fields while > it's being indexed would lead to a concurrent modification exception. > > Since IndexWriter.addDocument takes an Iterable<IndexableField>, I figured > I could just make my own document class that implemented Iterable, but > would allow me to add new fields onto the end of the document and extend > the iteration to cover those fields. > > I did this, but it didn't have the effect that I was hoping for, because > the fields that were added were never processed. > > Working through the code, I discovered that > DocFieldProcessor.processDocument iterates through all the fields in the > document, collecting them by field name (using it's own hash table?) before > processing them. > > Of course, this breaks my add-fields-as-other-fields-are-being-processed > approach because the iterator is exhausted before any of the processing > happens. > > So, my questions are: Does it make any sense to try to do this? If so, is > there an approach that will work without having to rewrite a lot of > indexing code? > > Thanks, > > Steve Green > -- > Stephen Green --------------------------------------------------------------------- To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org For additional commands, e-mail: java-user-h...@lucene.apache.org