Hello,
I'm using Lucene for a few weeks now in a small project and just ran
into a problem. My index contains words that contain one or more
underlines, e.g. XYZZZY_DE_SA0001 or XYZZZY_AT0001. Unfortunately the
tokenizer tokenizes / splits the word into multiple tokens at the
underscores, except
Hello,
first of all thanks to everyone for replies and suggestions. I solved my
problem by adapting the StandardTokenizer.jj and compiling it using
javacc.
I replaced line 90:
|)+ >
with
||"_")+ >
so that underscore is treated like alphanumeric characters. In my first
tests, it seems to work