On Fri, Jul 15, 2011 at 4:45 PM, Uwe Schindler <u...@thetaphi.de> wrote: > Hi, > >> The crappy thing is that to actually detect if there are any tokens in the >> field >> you need to make a TokenStream which can be used to read the first token >> and then rewind again. I'm not sure if there is such a thing in Lucene at >> the >> moment. We had to write it ourselves but we were on a considerably older >> version at the time. > > CachingTokenFilter plugged over any other TokenStream.
Ah, quite right. If you can afford the memory it will eat (or if your documents are all relatively small), CachingTokenFilter will work. I think in our case it caused OOME for larger character streams, which is why we ended up falling back to one which only cached the first token. TX --------------------------------------------------------------------- To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org For additional commands, e-mail: java-user-h...@lucene.apache.org