- Solr - Nutch
- Original Message
> From: legrand thomas
> To: java-user@lucene.apache.org
> Cc: francois.vanhi...@hotmail.fr
> Sent: Tuesday, December 30, 2008 8:52:57 AM
> Subject: Filtering accents
>
> Dear all,
>
> I'd like my lucene searches to be
You might want to take a look at using the ISOLatinAccentFilter or similar
at
both index and query time. It basically folds accented characters into their
un-accented form.
Matthew:
You wrote:
<<>>
I also did this before realizing that the second field is unnecessary.
Storing is
orthogonal to in
Just thought I'd comment since I had to do word processing before indexing
in my application as well. Matt's method is pretty similar to what I did.
I wrote a filter that transforms the tokens as they get indexed (and also
use that for searching). Since I am indexing a block of words, rather than
If you are constrained in such a way as to not use the French Analyzer
you might instead consider transforming the input as an additional step
at both search/indexing time.
Use something like a regex that looks for é and always replaces it with
e in the index, and at search time. (expand this
Dear all,
I'd like my lucene searches to be insensitive to (French) accents. For example,
considering a indexed term "métal", I want to get it when searching for "metal"
or "métal" . I use lucene-2.3.2 and the searches are performed with:
IndexSearcher.search(query,filter,sorter), Another filte