Hi,
Unfortunately I can't change the way things are indexed, so I guess I
need some short of utility class that will turn MartÃnez into
Martínez and then just search for that term.
I have also this problem using the StandardAnalyzer:
If I search for "cádiz" in luke the query gets parsed as
"cá diz" it changes ";" for " ". I would like to know which
symbols are not to be used in the search and how to scape them.
What would be the right query to find "cádiz"?
Thanks.
Raul
Jens Kraemer wrote:
Hi!
On Thu, Mar 09, 2006 at 04:31:23AM -0800, Raul Raja Martinez wrote:
Hi I have a lot of html indexed such as:
Martínez
Of course my users are gonna search for MartÃnez and they're not gonna
get a match.
Is there a common approach to solve this kind of problem in lucene,
Maybe some utility class or something?
there is a class named Entities in Jakarta commons-lang, which can
be used to resolve html entities before indexing. Maybe you could
integrate this into a custom analyzer.
http://jakarta.apache.org/commons/lang/xref/org/apache/commons/lang/Entities.html
regards,
Jens
---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]