Op Saturday 17 May 2008 20:28:40 schreef Karl Wettin:
> As far as I know Lucene only handle single word synonyms at index
> time. My life would be much simpler if it was possible to add
> synonyms that spanned over multiple tokens, such as "lucene in
> action"="lia". I have a couple of workarounds that are OK but it
> really isn't the same thing when it comes down to the scoring.
>
> The thing that does the best job at scoring was to assemble several
> permutations of the same document. But it doesn't feel good. I have
> cases where that means several hundred documents, and I have to do
> post processing to filter out the "duplicate" hits. It can turn out
> to be rather expensive. And I'm sure it mess with the scoring in
> several ways I did not notice yet.
>
> I've also considering creating some multi dimensional term position
> space, but I'd say that could take a lot of time to implement.
>
> Are there any good solutions to this?

The simplest solution is to index such synonyms at the first or last
or middle position of the source tokens, using a zero position
increment for the synonym. Was this one of the workarounds?

The advantage of the zero position increment is that the original
token positions are not affected, so at least there is no influence
on scoring because of changes in the original token positions.

Regards,
Paul Elschot

---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Reply via email to