Re: Letter-number transitions - can this be turned off

F Knudson Tue, 02 Oct 2007 10:21:30 -0700

Thanks for your helpful suggestions.

I have considered other analyzers but WDF has great strengths.  I will
experiment with maintaining transitions and then consider modifying the
code.


F. Knudson


Mike Klaas wrote:
> 
> On 30-Sep-07, at 12:47 PM, F Knudson wrote:
> 
>>
>> Is there a flag to disable the letter-number transition in the
>> solr.WordDelimiterFilterFactory?  We are indexing category codes,  
>> thesaurus
>> codes for which this letter number transition makes no sense.  It is
>> bloating the indexing (which is already large).
> 
> Have you considered using a different analyzer?
> 
> If you want to continue using WDF, you could make a quick change  
> around since 320:
> 
>              if (splitOnCaseChange == 0 &&
>                  (lastType & ALPHA) != 0 && (type & ALPHA) != 0) {
>                // ALPHA->ALPHA: always ignore if case isn't considered.
> 
>              } else if ((lastType & UPPER)!=0 && (type & LOWER)!=0) {
>                // UPPER->LOWER: Don't split
>              } else {
> 
>           ...
> 
> by adding a clause that catches ALPHA -> NUMERIC (and vice versa) and  
> ignores it.
> 
> Another approach that I am using locally is to maintain the  
> transitions, but force tokens to be a minimum size (so r2d2 doesn't  
> tokenize to four tokens but arrr2222deee2222 does).
> 
> There is a patch here: http://issues.apache.org/jira/browse/SOLR-293
> 
> If you vote for it, I promise to get it in for 1.3 <g>
> 
> -Mike
> 
> 

-- 
View this message in context: 
http://www.nabble.com/Letter-number-transitions---can-this-be-turned-off-tf4544769.html#a13003019
Sent from the Solr - User mailing list archive at Nabble.com.

Re: Letter-number transitions - can this be turned off

Reply via email to