[ 
https://issues.apache.org/jira/browse/SOLR-2150?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12919942#action_12919942
 ] 

Jan Høydahl commented on SOLR-2150:
-----------------------------------

What you describe is also a useful feature. I think of it even more generic, as 
a place to configure detection of various patterns, and apply some action on 
the query based on he match, whether that is fetching a weather forecast from 
an API, performing a calculation or rewriting the query to apply a filter. I 
think it deserves its own feature request, and then one could decide whether 
the same code base could power parts of both later in the design phase.

> Anti-phrasing feature
> ---------------------
>
>                 Key: SOLR-2150
>                 URL: https://issues.apache.org/jira/browse/SOLR-2150
>             Project: Solr
>          Issue Type: New Feature
>          Components: SearchComponents - other
>            Reporter: Jan Høydahl
>
> Add an anti-phrasing feature to Solr.
> Definition: Identifying word sequences in queries that do not contribute 
> essentially to the query's meaning, such as "Where can I find" or "Where is."
> (Source: http://www.google.com/search?q=define%3Aanti+phrasing)
> For general purpose search services, such as web, intranet, shopping search, 
> some users will try to write a question to the search engine, such as "how 
> much is an ipod nano". One straight-forward way of limiting the number of 
> 0-hits in such environments is to apply anti-phrasing, which uses a 
> dictionary of common sentence prefixes which should be stripped from the 
> incoming query before it is sent further to search.
> This can be implemented as a Search Component in Solr. The dictionary can be 
> language independent. We can encourage users to submit their tested 
> anti-phrasing dictionaries for various languages, and include those. The 
> dictionary can be a set of simple .txt files, loaded in memory at startup in 
> an efficient data structure such as b-tree or finite state automaton to avoid 
> redundancy and ensure quick matching. The procedure for detecting an 
> anti-phrase from the incoming query is to first lookup the full query phrase, 
> if no match, remove a word from the end, and do another lookup until either a 
> match or end of string. Example for query: "Who is Einstein?", where "Who is" 
> is defined as an anti phrase.
> 1. Lookup "Who is Einstein"
> 2. Lookup "Who is" (match), remove this prefix
> 3. Issue the query "Einstein" to search

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to