Thanks for your helpful suggestions.
I have considered other analyzers but WDF has great strengths. I will
experiment with maintaining transitions and then consider modifying the
code.
F. Knudson
Mike Klaas wrote:
>
> On 30-Sep-07, at 12:47 PM, F Knudson wrote:
>
>>
>> Is there a flag to disable the letter-number transition in the
>> solr.WordDelimiterFilterFactory? We are indexing category codes,
>> thesaurus
>> codes for which this letter number transition makes no sense. It is
>> bloating the indexing (which is already large).
>
> Have you considered using a different analyzer?
>
> If you want to continue using WDF, you could make a quick change
> around since 320:
>
> if (splitOnCaseChange == 0 &&
> (lastType & ALPHA) != 0 && (type & ALPHA) != 0) {
> // ALPHA->ALPHA: always ignore if case isn't considered.
>
> } else if ((lastType & UPPER)!=0 && (type & LOWER)!=0) {
> // UPPER->LOWER: Don't split
> } else {
>
> ...
>
> by adding a clause that catches ALPHA -> NUMERIC (and vice versa) and
> ignores it.
>
> Another approach that I am using locally is to maintain the
> transitions, but force tokens to be a minimum size (so r2d2 doesn't
> tokenize to four tokens but arrr2222deee2222 does).
>
> There is a patch here: http://issues.apache.org/jira/browse/SOLR-293
>
> If you vote for it, I promise to get it in for 1.3 <g>
>
> -Mike
>
>
--
View this message in context:
http://www.nabble.com/Letter-number-transitions---can-this-be-turned-off-tf4544769.html#a13003019
Sent from the Solr - User mailing list archive at Nabble.com.