Koji Sekiguchi wrote:
Paul Taylor wrote:
I want my search to treat 'No. 1' and 'No.1' the same, because in our context its one token I want 'No. 1' to become 'No.1', I need to do this before tokenizing because the tokenizer would split one value into two terms and one into just one term. I already use a NormalizeMapFilter to map &' to 'and' but I think it only takes literal text and I need to

1. be case insensitive (but lowercasefilter is only applied after tokenizing)

2. cope with all numbers e.g no. 109

So I was going to subclass BaseCharFilter and do my matches with a regular expression like ([Nn]+[Oo]+\\.) ([0-9]+) but I'm struggling to understand the offset methods you have to do once you get a match. Has anyone already got a regular expression Charfilter OR am I approaching this all wrong

thanks Paul



---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-user-h...@lucene.apache.org


Hi Paul,

I've written a patch for this kind of purpose. See:

https://issues.apache.org/jira/browse/SOLR-1653

Koji

Oops. I thought this is solr-user list, but it was java-user. :-D

Koji

--
http://www.rondhuit.com/en/


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-user-h...@lucene.apache.org

Reply via email to