thank you Ian On Sep 30, 2011, at 4:19 AM, Ian Lea wrote:
> This all changed with the 3.1 release. See > http://lucene.apache.org/java/3_1_0/changes/Changes.html#3.1.0.api_changes, > number 18. > > You can get the old behaviour with StandardAnalyzer by passing > VERSION_30, or you could look at UAX29URLEmailTokenizer which should > pick up the email component, although probably not the apostrophe. > > > -- > Ian. > > > On Thu, Sep 29, 2011 at 7:51 PM, Peyman Faratin <pey...@robustlinks.com> > wrote: >> Hi >> >> I have a sentence >> >> "i'll email you at x...@abc.com" >> >> and I am looking at the tokens a StandardAnalyzer (which uses the >> StandardTokenizer) produces >> >> 1: [i'll:0->4:<ALPHANUM>] >> 2: [email:5->10:<ALPHANUM>] >> 3: [you:11->14:<ALPHANUM>] >> 5: [x:18->19:<ALPHANUM>] >> 6: [abc.com:20->27:<ALPHANUM>] >> >> I am using the following constructor >> >> new StandardAnalyzer(Version.LUCENE_32), >> >> My question is: >> >> 1- shouldn't we be seeing a token x...@abc.com (since that is the grammar of >> StandardAnalyzer?, and >> >> 2- shouldn't the token type be "email" for abc.com and "apostrophe" for >> "i'll"? >> >> thank you >> >> Peyman > > --------------------------------------------------------------------- > To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org > For additional commands, e-mail: java-user-h...@lucene.apache.org > --------------------------------------------------------------------- To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org For additional commands, e-mail: java-user-h...@lucene.apache.org