Hi Dima, Did you see my response to your earlier email? I think it's what you're looking for:
http://markmail.org/message/jdcjxauj4odyuv7e Steve On Dec 25, 2012, at 1:17 PM, dokondr <doko...@gmail.com> wrote: > Hello, > Please, help. I am lost in TokenStream / Token / Analyzer API. > I am trying to figure out how to get _token_itself_ or token text while > looking at "Invoking the Analyzer" example (see example below and also at: > http://lucene.apache.org/core/4_0_0/core/org/apache/lucene/analysis/package-summary.html?is-external=true#package_description > ) > > Method "ts.reflectAsString(true))" returns lots of useful info: > org.apache.lucene.analysis.tokenattributes.CharTermAttribute#term=some,org.apache.lucene.analysis.tokenattributes.TermToBytesRefAttribute#bytes=[73 > 6f 6d > 65],org.apache.lucene.analysis.tokenattributes.OffsetAttribute#startOffset=0,org.apache.lucene.analysis.tokenattributes.OffsetAttribute#endOffset=4,org.apache.lucene.analysis.tokenattributes.PositionIncrementAttribute#positionIncrement=1,org.apache.lucene.analysis.tokenattributes.TypeAttribute#type=<ALPHANUM>,org.apache.lucene.analysis.tokenattributes.KeywordAttribute#keyword=false > > Yet, how to get token itself? In this case "some" ? > > Thanks! > > ------ Example in the documentation -------- > > Version matchVersion = Version.LUCENE_XY; // Substitute desired Lucene > version for XY > Analyzer analyzer = new StandardAnalyzer(matchVersion); // or any other > analyzer > TokenStream ts = analyzer.tokenStream("myfield", new StringReader("some > text goes here")); > OffsetAttribute offsetAtt = addAttribute(OffsetAttribute.class); > > try { > ts.reset(); // Resets this stream to the beginning. (Required) > while (ts.incrementToken()) { > // Use AttributeSource.reflectAsString(boolean) > // for token stream debugging. > System.out.println("token: " + ts.reflectAsString(true)); > > System.out.println("token start offset: " + > offsetAtt.startOffset()); > System.out.println(" token end offset: " + offsetAtt.endOffset()); > } > ts.end(); // Perform end-of-stream operations, e.g. set the final > offset. > } finally { > ts.close(); // Release resources associated with this stream. > } --------------------------------------------------------------------- To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org For additional commands, e-mail: java-user-h...@lucene.apache.org