[
https://issues.apache.org/jira/browse/SOLR-2150?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12919942#action_12919942
]
Jan Høydahl commented on SOLR-2150:
-----------------------------------
What you describe is also a useful feature. I think of it even more generic, as
a place to configure detection of various patterns, and apply some action on
the query based on he match, whether that is fetching a weather forecast from
an API, performing a calculation or rewriting the query to apply a filter. I
think it deserves its own feature request, and then one could decide whether
the same code base could power parts of both later in the design phase.
> Anti-phrasing feature
> ---------------------
>
> Key: SOLR-2150
> URL: https://issues.apache.org/jira/browse/SOLR-2150
> Project: Solr
> Issue Type: New Feature
> Components: SearchComponents - other
> Reporter: Jan Høydahl
>
> Add an anti-phrasing feature to Solr.
> Definition: Identifying word sequences in queries that do not contribute
> essentially to the query's meaning, such as "Where can I find" or "Where is."
> (Source: http://www.google.com/search?q=define%3Aanti+phrasing)
> For general purpose search services, such as web, intranet, shopping search,
> some users will try to write a question to the search engine, such as "how
> much is an ipod nano". One straight-forward way of limiting the number of
> 0-hits in such environments is to apply anti-phrasing, which uses a
> dictionary of common sentence prefixes which should be stripped from the
> incoming query before it is sent further to search.
> This can be implemented as a Search Component in Solr. The dictionary can be
> language independent. We can encourage users to submit their tested
> anti-phrasing dictionaries for various languages, and include those. The
> dictionary can be a set of simple .txt files, loaded in memory at startup in
> an efficient data structure such as b-tree or finite state automaton to avoid
> redundancy and ensure quick matching. The procedure for detecting an
> anti-phrase from the incoming query is to first lookup the full query phrase,
> if no match, remove a word from the end, and do another lookup until either a
> match or end of string. Example for query: "Who is Einstein?", where "Who is"
> is defined as an anti phrase.
> 1. Lookup "Who is Einstein"
> 2. Lookup "Who is" (match), remove this prefix
> 3. Issue the query "Einstein" to search
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]