Am 19.11.2012 17:44, schrieb Carsten Schnober: Hi again, just a little update:
> However, after switching to Lucene 4 and TokenStreamComponents, I'm > getting a strange behaviour: only the first document in the collection > is tokenized properly. The others do appear in the index, but > un-tokenized, although I have tried not to change anything in the logic. > The Analyzer now has this createComponents() method calling the custom > TokenStreamComponents class with my custom Tokenizer: > > @Override > protected TokenStreamComponents createComponents(String fieldName, > Reader reader) { > final Tokenizer source = new KoraTokenizer(reader); > final TokenStreamComponents tokenstream = new > KoraTokenStreamComponents(source); > try { > source.close(); > } catch (IOException e) { > jlog.error(e.getLocalizedMessage()); > e.printStackTrace(); > } > return tokenstream; > } When using the packaged Analyzer.TokenStreamComponents class instead of my custom KoraTokenStreamComponents class, the behaviour does not seem to change: - final TokenStreamComponents tokenstream = new KoraTokenStreamComponents(source); + final TokenStreamComponents tokenstream = new TokenStreamComponents(source); Best, Carsten -- Institut für Deutsche Sprache | http://www.ids-mannheim.de Projekt KorAP | http://korap.ids-mannheim.de Tel. +49-(0)621-43740789 | schno...@ids-mannheim.de Korpusanalyseplattform der nächsten Generation Next Generation Corpus Analysis Platform --------------------------------------------------------------------- To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org For additional commands, e-mail: java-user-h...@lucene.apache.org