Sure, all this is possible, I would know how to make my analyzer.I just faced this on an existing solution which takes StandardAnalyzer for almost everything because it's generic. Therefore I wanted to know if there was a rationale.
Of cours URIs have trivial analyzers.
paul Le 16-mars-09 à 00:03, Daniel Noll a écrit :
Paul Libbrecht wrote:Hello fellows of Lucene,I just discovered that the _ character is a word separator in the StandardAnalyzer.Can it be?It broke our usage of a field that stores a comma-separated list of "uri-fragments"If I were analysing a URI, I would not be using StandardAnalyser, but something that splits only on what is special for a URI. You wouldn't even want to break on a hyphen, normally.In your case, you are breaking it up already so you could just make that your analyser. Or if you want to keep breaking it up before it gets put into Lucene, wouldn't a trivial analyser which breaks on commas be the way to go?
smime.p7s
Description: S/MIME cryptographic signature