Better analysis of hyphenated words

Rob Young Thu, 27 Oct 2005 08:14:18 -0700

Hi,

I'm using StandardAnalyzer during indexing and I have noticed that itsplits hyphenated words in two, ditching the hyphen. This is messing upsome of my search results. I would like to keep using StandardAnalyzerbecause it's very good on the whole, however I would like to add anextra term in these cases. I am fine doing everything except figuringout when StandardTokenizer has split a hyphenated word. All I get is theindividual tokens with a type ALPHANUM. Can anyone think of a way I cando this without having to dive into StandardTokenizer?

I have looked at the source for StandardTokenizer and I really reallyreally don't want to have to go there :/


Cheers
Rob

---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Better analysis of hyphenated words

Reply via email to