Hi,

Unfortunately I can't change the way things are indexed, so I guess I need some short of utility class that will turn Martínez into Martínez and then just search for that term.

I have also this problem using the StandardAnalyzer:

If I search for "cádiz" in luke the query gets parsed as "c&aacute diz" it changes ";" for " ". I would like to know which symbols are not to be used in the search and how to scape them.

What would be the right query to find "cádiz"?

Thanks.

Raul

Jens Kraemer wrote:
Hi!

On Thu, Mar 09, 2006 at 04:31:23AM -0800, Raul Raja Martinez wrote:
Hi I have a lot of html indexed such as:

Martínez

Of course my users are gonna search for Martínez and they're not gonna get a match.

Is there a common approach to solve this kind of problem in lucene, Maybe some utility class or something?

there is a class named Entities in Jakarta commons-lang, which can
be used to resolve html entities before indexing. Maybe you could
integrate this into a custom analyzer.

http://jakarta.apache.org/commons/lang/xref/org/apache/commons/lang/Entities.html

regards,
Jens



---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Reply via email to