Hello, Please, help. I am lost in TokenStream / Token / Analyzer API. I am trying to figure out how to get _token_itself_ or token text while looking at "Invoking the Analyzer" example (see example below and also at: http://lucene.apache.org/core/4_0_0/core/org/apache/lucene/analysis/package-summary.html?is-external=true#package_description )
Method "ts.reflectAsString(true))" returns lots of useful info: org.apache.lucene.analysis.tokenattributes.CharTermAttribute#term=some,org.apache.lucene.analysis.tokenattributes.TermToBytesRefAttribute#bytes=[73 6f 6d 65],org.apache.lucene.analysis.tokenattributes.OffsetAttribute#startOffset=0,org.apache.lucene.analysis.tokenattributes.OffsetAttribute#endOffset=4,org.apache.lucene.analysis.tokenattributes.PositionIncrementAttribute#positionIncrement=1,org.apache.lucene.analysis.tokenattributes.TypeAttribute#type=<ALPHANUM>,org.apache.lucene.analysis.tokenattributes.KeywordAttribute#keyword=false Yet, how to get token itself? In this case "some" ? Thanks! ------ Example in the documentation -------- Version matchVersion = Version.LUCENE_XY; // Substitute desired Lucene version for XY Analyzer analyzer = new StandardAnalyzer(matchVersion); // or any other analyzer TokenStream ts = analyzer.tokenStream("myfield", new StringReader("some text goes here")); OffsetAttribute offsetAtt = addAttribute(OffsetAttribute.class); try { ts.reset(); // Resets this stream to the beginning. (Required) while (ts.incrementToken()) { // Use AttributeSource.reflectAsString(boolean) // for token stream debugging. System.out.println("token: " + ts.reflectAsString(true)); System.out.println("token start offset: " + offsetAtt.startOffset()); System.out.println(" token end offset: " + offsetAtt.endOffset()); } ts.end(); // Perform end-of-stream operations, e.g. set the final offset. } finally { ts.close(); // Release resources associated with this stream. }