Paul Libbrecht wrote:

Hello fellows of Lucene,

I just discovered that the _ character is a word separator in the StandardAnalyzer.
Can it be?
It broke our usage of a field that stores a comma-separated list of "uri-fragments"

If I were analysing a URI, I would not be using StandardAnalyser, but something that splits only on what is special for a URI. You wouldn't even want to break on a hyphen, normally.

In your case, you are breaking it up already so you could just make that your analyser. Or if you want to keep breaking it up before it gets put into Lucene, wouldn't a trivial analyser which breaks on commas be the way to go?

Daniel


--
Daniel Noll                            Forensic and eDiscovery Software
Senior Developer                              The world's most advanced
Nuix                                                email data analysis
http://nuix.com/                                and eDiscovery software

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-user-h...@lucene.apache.org

Reply via email to