I'm not sure how useful this reply is, but hey ;)

<aol>me too!</aol>

I do a vaguely similar thing; I have to strip accents from characters such as e-acute out of both my input data and my incoming search queries to put them into a standard form. I do this with a custom TokenFilter subclass. I have an analyzer that includes this filter along with some of the standard ones (LowercaseFilter, etc). I run the same analyzer on indexing and searching, which has been discussed in other posts.

My point is that I'm happy with this approach and I'd recommend you do a similar thing, at least as a first attempt.

Cheers,
Peter Pimley



Aigner, Thomas wrote:

Hello all,

        I am VERY new to Lucene and we are trying out Lucene to see if
it will accomplish the vast majority of our search functions.

        I have a question about a good way to index some of our product
description codes.  We have description codes like 21-MA-GAB and other
punctuation. Our users need to be able to search for "21 MA GAB" or "21-MA_GAB" or "21MAGAB". Is the best way to accomplish this by
creating synonyms for the 3 different ways when punctuation is in parts
to search for? I know I can stop punctuation in the index but what about
grouping the information together or with spaces?

Thanks all in advance,
Tom


---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Reply via email to