Re: standardTokenizer - how to terminate at End of Stream

Beady Geraghty Wed, 21 Sep 2005 17:56:19 -0700

Thank you for the response.
 I was trying to do something really simple - I want to extract the context
for
terms and phrases from files that satisfy some (many) queries.
I *know* that file test.txt is a hit (because I queried the index, and
it tells me that test.txt satisfies the query). Then, I open the file, and
use Lucene's
standardTokenizer to tokenize the input. I get a token at a time
to see which token or consecutive tokens match the terms/phrases.
Then I extract the context surrounding these terms.
 I didn't try the highlighter because I don't really need to "highlight",
and I didn't
look clearly whether some of the classes provided in the package would
already do
what I need. (Although, I would imagine this is something many people would
have done what I try to do already. It appears to have a fragmenter, and I
don't
know if that is something I need.)
 Since I used the StandAnalyzer when I originally created the index,
I therefore use the StandardTokenizer to tokenize the input stream.
 Is there a better way to do what I try to do ?
  From your comment below, it appears that I should just use next() instead
of
getNextToken(), is that correct ?
 Thanks



 On 9/21/05, Erik Hatcher <[EMAIL PROTECTED]> wrote:
>
> Could you elaborate on what you're trying to do, please?
>
> Using StandardTokenizer in this low-level fashion is practically
> unheard of, so I think knowing what you're attempting to do will help
> us help you :)
>
> Erik
>
>
>

Re: standardTokenizer - how to terminate at End of Stream

Reply via email to