Paul Libbrecht wrote:
Hello fellows of Lucene,
I just discovered that the _ character is a word separator in the
StandardAnalyzer.
Can it be?
It broke our usage of a field that stores a comma-separated list of
"uri-fragments"
If I were analysing a URI, I would not be using StandardAnalyser, but
something that splits only on what is special for a URI. You wouldn't
even want to break on a hyphen, normally.
In your case, you are breaking it up already so you could just make that
your analyser. Or if you want to keep breaking it up before it gets put
into Lucene, wouldn't a trivial analyser which breaks on commas be the
way to go?
Daniel
--
Daniel Noll Forensic and eDiscovery Software
Senior Developer The world's most advanced
Nuix email data analysis
http://nuix.com/ and eDiscovery software
---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-user-h...@lucene.apache.org