[ 
https://issues.apache.org/jira/browse/LUCENE-1622?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12870593#action_12870593
 ] 

Robert Muir commented on LUCENE-1622:
-------------------------------------

bq. There are tricky tradeoffs of index time vs search time

The worst tradeoff at all, is that users can't make it.

For other reasons, including this, we should start thinking about removing 
QueryParser's split-on-whitespace.


> Multi-word synonym filter (synonym expansion at indexing time).
> ---------------------------------------------------------------
>
>                 Key: LUCENE-1622
>                 URL: https://issues.apache.org/jira/browse/LUCENE-1622
>             Project: Lucene - Java
>          Issue Type: New Feature
>          Components: contrib/analyzers
>            Reporter: Dawid Weiss
>            Priority: Minor
>         Attachments: synonyms.patch
>
>
> It would be useful to have a filter that provides support for indexing-time 
> synonym expansion, especially for multi-word synonyms (with multi-word 
> matching for original tokens).
> The problem is not trivial, as observed on the mailing list. The problems I 
> was able to identify (mentioned in the unit tests as well):
> - if multi-word synonyms are indexed together with the original token stream 
> (at overlapping positions), then a query for a partial synonym sequence 
> (e.g., "big" in the synonym "big apple" for "new york city") causes the 
> document to match;
> - there are problems with highlighting the original document when synonym is 
> matched (see unit tests for an example),
> - if the synonym is of different length than the original sequence of tokens 
> to be matched, then phrase queries spanning the synonym and the original 
> sequence boundary won't be found. Example "big apple" synonym for "new york 
> city". A phrase query "big apple restaurants" won't match "new york city 
> restaurants".
> I am posting the patch that implements phrase synonyms as a token filter. 
> This is not necessarily intended for immediate inclusion, but may provide a 
> basis for many people to experiment and adjust to their own scenarios.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to