This all changed with the 3.1 release.  See
http://lucene.apache.org/java/3_1_0/changes/Changes.html#3.1.0.api_changes,
number 18.

You can get the old behaviour with StandardAnalyzer by passing
VERSION_30, or you could look at UAX29URLEmailTokenizer which should
pick up the email component, although probably not the apostrophe.


--
Ian.


On Thu, Sep 29, 2011 at 7:51 PM, Peyman Faratin <pey...@robustlinks.com> wrote:
> Hi
>
> I have a sentence
>
> "i'll email you at x...@abc.com"
>
> and I am looking at the tokens a StandardAnalyzer (which uses the 
> StandardTokenizer) produces
>
> 1: [i'll:0->4:<ALPHANUM>]
> 2: [email:5->10:<ALPHANUM>]
> 3: [you:11->14:<ALPHANUM>]
> 5: [x:18->19:<ALPHANUM>]
> 6: [abc.com:20->27:<ALPHANUM>]
>
> I am using the following constructor
>
>    new StandardAnalyzer(Version.LUCENE_32),
>
> My question is:
>
> 1- shouldn't we be seeing a token x...@abc.com (since that is the grammar of 
> StandardAnalyzer?, and
>
> 2- shouldn't the token type be "email" for abc.com and "apostrophe" for 
> "i'll"?
>
> thank you
>
> Peyman

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-user-h...@lucene.apache.org

Reply via email to