http://www.blardone.org/2008/10/12/lucene-query-accented-character/
Is specific about Php, but can be easily use try to solve the same problem
in Java.
I had the same problem as "Christophe from paris", and changing the query to
it's html encoded equivalent makes my search queries work.
So Perh
: http://www.blardone.org/2008/10/12/lucene-query-accented-character/
thta post appears to be specificly about a PHP function to convert UTF-8
characters to their HTML equivilents ... which doesn'trelaly seem relevant
to the posters question ...
: > I'm use FrenchAnalyzer for index
..
Does this :
http://www.blardone.org/2008/10/12/lucene-query-accented-character/
solve your problem ?
Cheers,
lekamm
Christophe from paris wrote:
>
> Hello
>
> I'm use FrenchAnalyzer for index
>
> IndexWriter writer = new IndexWriter(pathOfIndex, new FrenchAnalyzer(),
> true);
> Document
Yes markrmiller,the order is important
then
TokenStream result = new StandardTokenizer(reader);
result = new StandardFilter(result);
result = new StopFilter(result, stoptable);
result = new ISOLatin1AccentFilter(result);
result = new FrenchStemFilter(result, excltable);
You certainly can - just create your own Analyzer starting with a copy
of the French one you are using.
Then you just plug in the filter in the order you want it applied:
result = new ISOLatin1AccentFilter(result);
You have to decide for yourself where it will come - if you put it
before the
Actualy in my FrenchAnalyser
i have :
TokenStream result = new StandardTokenizer(reader);
result = new StandardFilter(result);
result = new StopFilter(result, stoptable);
result = new FrenchStemFilter(result, excltable);
result = new LowerCaseFilter(result);
I can use ISOLati
Check out org.apache.lucene.analysis.ISOLatin1AccentFilter
It will strip diacritics - just be sure to use it at index time and
query time to get what you want. Also, you will no longer be able to
differentiate between the two in your searching (rarely that important
in my opinion, but others c