Marvin Humphrey wrote:
I wrote:

It looks like StopAnalyzer tokenizes by letter, and doesn't handle apostrophes. So, the input "I don't know" produces these tokens:

    don
    t
    know

Is that right?

It's not right. StopAnalyzer does tokenize letter by letter, but 't' is a stopword, so the tokens are:

    don
    know

Phew, that's much more useful.

Naturally. But if you actually cared about apostrophes you would be using StandardAnalyzer instead, right?

Daniel


--
Daniel Noll

Nuix Pty Ltd
Suite 79, 89 Jones St, Ultimo NSW 2007, Australia    Ph: +61 2 9280 0699
Web: http://www.nuix.com.au/                        Fax: +61 2 9212 6902

This message is intended only for the named recipient. If you are not
the intended recipient you are notified that disclosing, copying,
distributing or taking any action in reliance on the contents of this
message or attachment is strictly prohibited.

---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Reply via email to