Matching accented with non-accented characters

Rajan, Renuka Tue, 25 Jul 2006 08:41:17 -0700

Hi All


I am trying to match accented characters with non-accented characters in 
French/Spanish and other Western European languages.  The use case is that the 
users may type letters without accents in error and we still want to be able to 
retrieve valid matches.  The one idea, albeit naïve, is to normalize the data 
on the inbound side as well as the data in the database (prior to full text 
indexing) and retrieve matches.  

 

For instance if the database contains a word like BE/BE/ (/ being the 
equivalent of aigu since I don't have a French keyboard:-)) and the input is 
erroneously provided as BE/BE (last aigu missing), we still want to be able 
retrieve BE/BE/ as a candidate match admittedly with an error margin.  

 

Has anyone using Lucene successfully (ie in terms of decent performance AND 
validity of results) to match non-accented characters with accented ones using 
some method?  Any method?  Anyone have suggestions to improve the suggestion 
above?

 

Any input will be greatly appreciated! Merci beaucoup :-)

Renuka



The information contained in this communication may be CONFIDENTIAL and is 
intended only for the use of the recipient(s) named above.  If you are not the 
intended recipient, you are hereby notified that any dissemination, 
distribution, or copying of this communication, or any of its contents, is 
strictly prohibited.  If you have received this communication in error, please 
notify the sender and delete/destroy the original message and any copy of it 
from your computer or paper files.

Matching accented with non-accented characters

Reply via email to