[
https://issues.apache.org/jira/browse/SOLR-2010?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
James Dyer updated SOLR-2010:
-----------------------------
Attachment: SOLR-2010.patch
Third version (with ".patch" extension. I had used ".txt" extension with 2nd
version). Works with trunk rev#986945.
This time SpellCheckCollator calls the SearchHandler instead of calling the
QueryComponent. This required exposing a reference to the SearchHandler on the
ResponseBuilder. Also a new overloaded method in
SearchHandler.processRequestBody() lets you override the list of components to
run. In this case we just have it run QueryComponent.
This revision has 2 potential benefits:
(1) the overloaded method in SearchHandler may prove useful to other components
in the future.
(2) there may be a way to get SearchHandler to requery all the shards at once
and then there would be no need to reintegrate the Collations in
SearchHandler.finishStage(). However, see my comment in SpellCheckCollator
lines 56-57. Likely I am calling SpellCheckCollator during the wrong "stage"
of the distributed request but I a need to find out more specifically how
shards work to determine how to further improve this here. As time allows I
will do my own investigating but anyone's advice would be greatly appreciated.
Finally, this version corrects a bug that would have caused one of the test
scenarios in DistributedSpellCheckComponentTest to fail. Unfortunately in the
2nd version, I had left some scenarios commented-out and did not catch this
until now.
> Improvements to SpellCheckComponent Collate functionality
> ---------------------------------------------------------
>
> Key: SOLR-2010
> URL: https://issues.apache.org/jira/browse/SOLR-2010
> Project: Solr
> Issue Type: New Feature
> Components: clients - java, spellchecker
> Affects Versions: 1.4.1
> Environment: Tested against trunk revision 966633
> Reporter: James Dyer
> Assignee: Grant Ingersoll
> Priority: Minor
> Attachments: SOLR-2010.patch, SOLR-2010.patch, SOLR-2010.patch,
> SOLR-2010.txt
>
>
> Improvements to SpellCheckComponent Collate functionality
> Our project requires a better Spell Check Collator. I'm contributing this as
> a patch to get suggestions for improvements and in case there is a broader
> need for these features.
> 1. Only return collations that are guaranteed to result in hits if re-queried
> (applying original fq params also). This is especially helpful when there is
> more than one correction per query. The 1.4 behavior does not verify that a
> particular combination will actually return hits.
> 2. Provide the option to get multiple collation suggestions
> 3. Provide extended collation results including the # of hits re-querying
> will return and a breakdown of each misspelled word and its correction.
> This patch is similar to what is described in SOLR-507 item #1. Also, this
> patch provides a viable workaround for the problem discussed in SOLR-1074. A
> dictionary could be created that combines the terms from the multiple fields.
> The collator then would prune out any spurious suggestions this would cause.
> This patch adds the following spellcheck parameters:
> 1. spellcheck.maxCollationTries - maximum # of collation possibilities to try
> before giving up. Lower values ensure better performance. Higher values may
> be necessary to find a collation that can return results. Default is 0,
> which maintains backwards-compatible behavior (do not check collations).
> 2. spellcheck.maxCollations - maximum # of collations to return. Default is
> 1, which maintains backwards-compatible behavior.
> 3. spellcheck.collateExtendedResult - if true, returns an expanded response
> format detailing collations found. default is false, which maintains
> backwards-compatible behavior. When true, output is like this (in context):
> <lst name="spellcheck">
> <lst name="suggestions">
> <lst name="hopq">
> <int name="numFound">94</int>
> <int name="startOffset">7</int>
> <int name="endOffset">11</int>
> <arr name="suggestion">
> <str>hope</str>
> <str>how</str>
> <str>hope</str>
> <str>chops</str>
> <str>hoped</str>
> etc
> </arr>
> <lst name="faill">
> <int name="numFound">100</int>
> <int name="startOffset">16</int>
> <int name="endOffset">21</int>
> <arr name="suggestion">
> <str>fall</str>
> <str>fails</str>
> <str>fail</str>
> <str>fill</str>
> <str>faith</str>
> <str>all</str>
> etc
> </arr>
> </lst>
> <lst name="collation">
> <str name="collationQuery">Title:(how AND fails)</str>
> <int name="hits">2</int>
> <lst name="misspellingsAndCorrections">
> <str name="hopq">how</str>
> <str name="faill">fails</str>
> </lst>
> </lst>
> <lst name="collation">
> <str name="collationQuery">Title:(hope AND faith)</str>
> <int name="hits">2</int>
> <lst name="misspellingsAndCorrections">
> <str name="hopq">hope</str>
> <str name="faill">faith</str>
> </lst>
> </lst>
> <lst name="collation">
> <str name="collationQuery">Title:(chops AND all)</str>
> <int name="hits">1</int>
> <lst name="misspellingsAndCorrections">
> <str name="hopq">chops</str>
> <str name="faill">all</str>
> </lst>
> </lst>
> </lst>
> </lst>
> In addition, SOLRJ is updated to include
> SpellCheckResponse.getCollatedResults(), which will return the expanded
> Collation format. getCollatedResult(), which returns a single String, is
> retained for backwards-compatibility. Other APIs were not changed but will
> still work provided that spellcheck.collateExtendedResult is false.
> This likely will not return valid results if using Shards. Rather, a more
> robust interaction with the index would be necessary than what exists in
> SpellCheckCollator.collate().
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]