Re: Filtering accents

Matthew Hall Tue, 30 Dec 2008 06:23:08 -0800

If you are constrained in such a way as to not use the French Analyzeryou might instead consider transforming the input as an additional stepat both search/indexing time.

Use something like a regex that looks for é and always replaces it withe in the index, and at search time. (expand this transformation step asneeded)

You likely also need to store the original word somewhere, so I wouldsuggest adding a second stored, but unindexed field that stores theoriginal value of the word, so when you match on your search criteria,you will also get the original form of the word in your hits object.


Hope this helps,

Matt

egrand thomas wrote:

Dear all,

I'd like my lucene searches to be insensitive to (French) accents. For example, considering a indexed term 
"métal", I want to get it when searching for "metal" or "métal" . I use lucene-2.3.2 and 
the searches are performed with: IndexSearcher.search(query,filter,sorter), Another filter is already used together 
with a "Sort" object. Futrhermore, I cannot use the FrenchAnalyzer as my index does not only contain French 
words.

Can anybody help ?
Thanks in advance,
Tom



---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-user-h...@lucene.apache.org

Re: Filtering accents

Reply via email to