You'd be better off reading and selecting the syslog lines outside of
lucene.  Then pass the lines you are interested in to lucene using
whatever analyzer you want.


--
Ian.


On Sun, Sep 5, 2010 at 10:09 PM, Lev Bronshtein
<lev_bronsht...@hotmail.com> wrote:
>
> Hello group,
>
> I am new to Lucene and ran into a bit of trouble while writing an app.  I 
> would like to selectively index lines from a syslog on a unix system, to this 
> end I first wrote tokenizer that returns an entire line as a token extending 
> CharTokenizer
>
>   protected boolean isTokenChar(char c) {
>     return !((c == '\n') || (c == '\r'));
>   }
>
> Perhaps that is my first mistake and I should have done things differently?
>
> I then pass this to a filter that only selects the lines with text I am 
> interested in
>
>  public final boolean incrementToken() throws IOException
>  {
>   while (input.incrementToken())
>   {
>    Matcher lineMatcher = linePattern.matcher(termAtt.term());
>    if (lineMatcher.find()) //(we like the payload)
>      return true;
>   }
>   //reached EOS -- return false
>   return false;
>  }
>
> However the issue is that, now that I have the line I want to break up the 
> individual line into tokens along white space, but the WhitespaceTokenizer 
> does not take a TokenStream as a constructor parameter.  Can anyone offer  
> suggestion for a workaround?
>
> Regards,
>
> Lev Bronshtein
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
> For additional commands, e-mail: java-user-h...@lucene.apache.org
>
>

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-user-h...@lucene.apache.org

Reply via email to