You'd be better off reading and selecting the syslog lines outside of lucene. Then pass the lines you are interested in to lucene using whatever analyzer you want.
-- Ian. On Sun, Sep 5, 2010 at 10:09 PM, Lev Bronshtein <lev_bronsht...@hotmail.com> wrote: > > Hello group, > > I am new to Lucene and ran into a bit of trouble while writing an app. I > would like to selectively index lines from a syslog on a unix system, to this > end I first wrote tokenizer that returns an entire line as a token extending > CharTokenizer > > protected boolean isTokenChar(char c) { > return !((c == '\n') || (c == '\r')); > } > > Perhaps that is my first mistake and I should have done things differently? > > I then pass this to a filter that only selects the lines with text I am > interested in > > public final boolean incrementToken() throws IOException > { > while (input.incrementToken()) > { > Matcher lineMatcher = linePattern.matcher(termAtt.term()); > if (lineMatcher.find()) //(we like the payload) > return true; > } > //reached EOS -- return false > return false; > } > > However the issue is that, now that I have the line I want to break up the > individual line into tokens along white space, but the WhitespaceTokenizer > does not take a TokenStream as a constructor parameter. Can anyone offer > suggestion for a workaround? > > Regards, > > Lev Bronshtein > > --------------------------------------------------------------------- > To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org > For additional commands, e-mail: java-user-h...@lucene.apache.org > > --------------------------------------------------------------------- To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org For additional commands, e-mail: java-user-h...@lucene.apache.org