Hi Steve, Thanks for you help (just found your e-mail in list archive), your solution works! Below is complete working example... However, before finding your answer, I hacked a straw-man solution, which is bad way to solve the problem:
// Hack out token - bad way! String tmp = ts.reflectAsString(false); String sameToken = (tmp.split(",")[0]).split("=")[1]; System.out.println("*** Same token : " + sameToken); It is not a right way, I repeat and I give here just for fun. ---- Complete working example ---- Version matchVersion = Version.LUCENE_40; // Substitute desired Lucene version for XY Analyzer analyzer = new RussianAnalyzer(matchVersion); // or any other analyzer TokenStream ts = analyzer.tokenStream("myfield", new StringReader("some text goes here")); OffsetAttribute offsetAtt = ts.addAttribute(OffsetAttribute.class); // To get token strings we need this: CharTermAttribute termAtt = ts.addAttribute(CharTermAttribute.class); try { ts.reset(); // Resets this stream to the beginning. (Required) while (ts.incrementToken()) { // Use AttributeSource.reflectAsString(boolean) // for token stream debugging. System.out.println("token: " + ts.reflectAsString(true)); // Right way to get tokens String token = termAtt.toString(); System.out.println("*** Token: " + token); // Hack out token - bad way! String tmp = ts.reflectAsString(false); String sameToken = (tmp.split(",")[0]).split("=")[1]; System.out.println("*** Same token : " + sameToken); System.out.println("token start offset: " + offsetAtt.startOffset()); System.out.println("token end offset: " + offsetAtt.endOffset()); } ts.end(); // Perform end-of-stream operations, e.g. set the final offset. } finally { ts.close(); // Release resources associated with this stream. analyzer.close(); } Hi Dima, > > The example code you mentioned in your other recent email is pretty close. > > The only thing you'd probably want to add is access to the > CharTermAttribute: > > CharTermAttribute termAtt = addAttribute(CharTermAttribute.class); > > and then in the loop over ts.incrementToken(), you can get to the output > tokens > using termAtt.buffer() and termAtt.length(), or if you're going to > Stringify > tokens anyway, termAtt.toString(). > > Steve >