On Apr 6, 2006, at 4:23 PM, Daniel Noll wrote:

Marvin Humphrey wrote:
I wrote:
It looks like StopAnalyzer tokenizes by letter, and doesn't handle apostrophes. So, the input "I don't know" produces these tokens:

    don
    t
    know

Is that right?
It's not right. StopAnalyzer does tokenize letter by letter, but 't' is a stopword, so the tokens are:
    don
    know
Phew, that's much more useful.

Naturally. But if you actually cared about apostrophes you would be using StandardAnalyzer instead, right?

I'd be using PolyAnalyzer ;)

<http://www.rectangular.com/kinosearch/docs/devel/KinoSearch/Analysis/ PolyAnalyzer.html>

... unless I were trying to duplicate the behavior of StopAnalyzer for benchmarking purposes.

Cheers,

Marvin Humphrey
Rectangular Research
http://www.rectangular.com/


---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Reply via email to