Re: Filtering accents

2008-12-30 Thread Otis Gospodnetic
- Solr - Nutch - Original Message > From: legrand thomas > To: java-user@lucene.apache.org > Cc: francois.vanhi...@hotmail.fr > Sent: Tuesday, December 30, 2008 8:52:57 AM > Subject: Filtering accents > > Dear all, > > I'd like my lucene searches to be

Re: Filtering accents

2008-12-30 Thread Erick Erickson
You might want to take a look at using the ISOLatinAccentFilter or similar at both index and query time. It basically folds accented characters into their un-accented form. Matthew: You wrote: <<>> I also did this before realizing that the second field is unnecessary. Storing is orthogonal to in

Re: Filtering accents

2008-12-30 Thread Greg Shackles
Just thought I'd comment since I had to do word processing before indexing in my application as well. Matt's method is pretty similar to what I did. I wrote a filter that transforms the tokens as they get indexed (and also use that for searching). Since I am indexing a block of words, rather than

Re: Filtering accents

2008-12-30 Thread Matthew Hall
If you are constrained in such a way as to not use the French Analyzer you might instead consider transforming the input as an additional step at both search/indexing time. Use something like a regex that looks for é and always replaces it with e in the index, and at search time. (expand this

Filtering accents

2008-12-30 Thread legrand thomas
Dear all, I'd like my lucene searches to be insensitive to (French) accents. For example, considering a indexed term "métal", I want to get it when searching for "metal" or "métal" . I use lucene-2.3.2 and the searches are performed with: IndexSearcher.search(query,filter,sorter), Another filte