Hello group, I am new to Lucene and ran into a bit of trouble while writing an app. I would like to selectively index lines from a syslog on a unix system, to this end I first wrote tokenizer that returns an entire line as a token extending CharTokenizer
protected boolean isTokenChar(char c) { return !((c == '\n') || (c == '\r')); } Perhaps that is my first mistake and I should have done things differently? I then pass this to a filter that only selects the lines with text I am interested in public final boolean incrementToken() throws IOException { while (input.incrementToken()) { Matcher lineMatcher = linePattern.matcher(termAtt.term()); if (lineMatcher.find()) //(we like the payload) return true; } //reached EOS -- return false return false; } However the issue is that, now that I have the line I want to break up the individual line into tokens along white space, but the WhitespaceTokenizer does not take a TokenStream as a constructor parameter. Can anyone offer suggestion for a workaround? Regards, Lev Bronshtein --------------------------------------------------------------------- To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org For additional commands, e-mail: java-user-h...@lucene.apache.org