Re: Looking for a MappingCharFilter that accepts regular expressions

Paul Taylor Mon, 14 Dec 2009 23:33:24 -0800

Koji Sekiguchi wrote:

Koji Sekiguchi wrote:
Paul Taylor wrote:
I want my search to treat 'No. 1' and 'No.1' the same, because inour context its one token I want 'No. 1' to become 'No.1', I needto do this before tokenizing because the tokenizer would split onevalue into two terms and one into just one term. I already use aNormalizeMapFilter to map &' to 'and' but I think it only takesliteral text and I need to
1. be case insensitive (but lowercasefilter is only applied aftertokenizing)
2. cope with all numbers e.g no. 109
So I was going to subclass BaseCharFilter and do my matches with aregular expression like ([Nn]+[Oo]+\\.) ([0-9]+) but I'm strugglingto understand the offset methods you have to do once you get amatch. Has anyone already got a regular expression Charfilter OR amI approaching this all wrong
thanks Paul



---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]
Hi Paul,

I've written a patch for this kind of purpose. See:

https://issues.apache.org/jira/browse/SOLR-1653

Koji
Oops. I thought this is solr-user list, but it was java-user. :-D

Koji

Hi Koji

Just saw your post, could you send me a link to just the

PatternReplaceCharFilter.java

file, please

Paul




---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Re: Looking for a MappingCharFilter that accepts regular expressions

Reply via email to