Strange behaviour of StandardTokenizer

Anna Hunecke Thu, 17 Jun 2010 06:32:11 -0700

Hi!

I ran into a strange behaviour of the StandardTokenizer. Terms containing a '-' 
are tokenized differently depending on the context. 
For example, the term 'nl-lt' is split into 'nl' and 'lt'.
The term 'nl-lt0' is tokenized into 'nl-lt0'.
Is this a bug or a feature? Can I avoid it somehow?
I'm using Lucene 3.0.0.


Best,
Anna



---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Strange behaviour of StandardTokenizer

Reply via email to