Does StandardTokenizer remove punctuation (in Lucene 4.1)

Im just trying to move back to StandardTokenizer from my own old custom implemenation because the newer version seems to have much better support for Asian languages

However this code except fails on incrementToken() implying that the !!! are removed from output, yet looking at the jflex classes I cant see anything to indicate punctuation is removed, is it removed and if so can i remove it ?

Tokenizer tokenizer = new StandardTokenizer(LuceneVersion.LUCENE_VERSION, new StringReader("!!!"));
assertNotNull(tokenizer);
tokenizer.reset();
assertTrue(tokenizer.incrementToken());

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-user-h...@lucene.apache.org

Reply via email to