[
https://issues.apache.org/jira/browse/SOLR-445?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15193606#comment-15193606
]
Hoss Man commented on SOLR-445:
-------------------------------
(ment to post last friday but was blocked by the jira outage)
Ok ... i think things are looking pretty good on the jira/SOLR-445 branch --
good enough that I'd really like some help reviewing the code & sanity checking
the API (and internals for anyone who is up for it)...
----
For folks who haven't been following closely, here's what the configuration
looks like (from the javadocs)...
{code}
<processor class="solr.TolerantUpdateProcessorFactory">
<int name="maxErrors">10</int>
</processor>
{code}
When a chain with this processor is used, but maxErrors isn't exceeded, here's
what the response looks like...
{code}
$ curl
'http://localhost:8983/solr/techproducts/update?update.chain=tolerant-chain&wt=json&indent=true&maxErrors=-1'
-H "Content-Type: application/json" --data-binary '{"add" : {
"doc":{"id":"1","foo_i":"bogus"}}, "delete": {"query":"malformed:["}}'
{
"responseHeader":{
"errors":[{
"type":"ADD",
"id":"1",
"message":"ERROR: [doc=1] Error adding field 'foo_i'='bogus' msg=For
input string: \"bogus\""},
{
"type":"DELQ",
"id":"malformed:[",
"message":"org.apache.solr.search.SyntaxError: Cannot parse
'malformed:[': Encountered \"<EOF>\" at line 1, column 11.\nWas expecting one
of:\n <RANGE_QUOTED> ...\n <RANGE_GOOP> ...\n "}],
"maxErrors":-1,
"status":0,
"QTime":1}}
{code}
Note in the above example that:
* maxErrors can be overridden on a per-request basis
* an effective {{maxErrors==-1}} (either from config, or request param) means
"unlimited" (under the covers it's using {{Integer.MAX_VALUE}})
If/When maxErrors is reached for a request, then the _first_ exception that the
processor caught is propagated back to the user, and metadata is set on that
exception with all of the same details about all the tolerated errors.
This next example is the same as the previous except that instead of
{{maxErrors=-1}} the request param is now {{maxErrors=1}}...
{code}
$ curl
'http://localhost:8983/solr/techproducts/update?update.chain=tolerant-chain&wt=json&indent=true&maxErrors=1'
-H "Content-Type: application/json" --data-binary '{"add" : {
"doc":{"id":"1","foo_i":"bogus"}}, "delete": {"query":"malformed:["}}'
{
"responseHeader":{
"errors":[{
"type":"ADD",
"id":"1",
"message":"ERROR: [doc=1] Error adding field 'foo_i'='bogus' msg=For
input string: \"bogus\""},
{
"type":"DELQ",
"id":"malformed:[",
"message":"org.apache.solr.search.SyntaxError: Cannot parse
'malformed:[': Encountered \"<EOF>\" at line 1, column 11.\nWas expecting one
of:\n <RANGE_QUOTED> ...\n <RANGE_GOOP> ...\n "}],
"maxErrors":1,
"status":400,
"QTime":1},
"error":{
"metadata":[
"org.apache.solr.common.ToleratedUpdateError--ADD:1","ERROR: [doc=1]
Error adding field 'foo_i'='bogus' msg=For input string: \"bogus\"",
"org.apache.solr.common.ToleratedUpdateError--DELQ:malformed:[","org.apache.solr.search.SyntaxError:
Cannot parse 'malformed:[': Encountered \"<EOF>\" at line 1, column 11.\nWas
expecting one of:\n <RANGE_QUOTED> ...\n <RANGE_GOOP> ...\n ",
"error-class","org.apache.solr.common.SolrException",
"root-error-class","java.lang.NumberFormatException"],
"msg":"ERROR: [doc=1] Error adding field 'foo_i'='bogus' msg=For input
string: \"bogus\"",
"code":400}}
{code}
...the added exception metadata ensures that even in client code like the
various SolrJ SolrClient implementations, which throw a (client side) exception
on non-200 responses, the end user can access info on all the tolerated errors
that were ignored before the maxErrors threshold was reached.
CloudSolrClient in particular -- which already has logic to split
{{UpdateRequests}}; route individual commands to the appropraite leaders; and
merge the responses -- has been updated to handle merging these responses as
well.
(The {{ToleratedUpdateError}} class for modeling these types of errors has been
added to solr-common, and has static utilities that client code can use to
parse the data out of the responseHeader or out of any client side
SolrException metadata)
----
There are still a bunch of {{nocommit}} comments, but they are almost all
related to either:
* adding tests
* adding docs
* refactoring / hardening some internal APIs
* removing suspected unneccessary "isLeader" code (once tests are final)
I'll keep working on those, but I'd appreciate feedback from folks on how
things currently stand.
Even if you don't understand/care about the internals, thoughts on the user
facing API would be appreciated.
> Update Handlers abort with bad documents
> ----------------------------------------
>
> Key: SOLR-445
> URL: https://issues.apache.org/jira/browse/SOLR-445
> Project: Solr
> Issue Type: Improvement
> Components: update
> Affects Versions: 1.3
> Reporter: Will Johnson
> Assignee: Hoss Man
> Attachments: SOLR-445-3_x.patch, SOLR-445-alternative.patch,
> SOLR-445-alternative.patch, SOLR-445-alternative.patch,
> SOLR-445-alternative.patch, SOLR-445.patch, SOLR-445.patch, SOLR-445.patch,
> SOLR-445.patch, SOLR-445.patch, SOLR-445.patch, SOLR-445.patch,
> SOLR-445.patch, SOLR-445.patch, SOLR-445_3x.patch, solr-445.xml
>
>
> Has anyone run into the problem of handling bad documents / failures mid
> batch. Ie:
> <add>
> <doc>
> <field name="id">1</field>
> </doc>
> <doc>
> <field name="id">2</field>
> <field name="myDateField">I_AM_A_BAD_DATE</field>
> </doc>
> <doc>
> <field name="id">3</field>
> </doc>
> </add>
> Right now solr adds the first doc and then aborts. It would seem like it
> should either fail the entire batch or log a message/return a code and then
> continue on to add doc 3. Option 1 would seem to be much harder to
> accomplish and possibly require more memory while Option 2 would require more
> information to come back from the API. I'm about to dig into this but I
> thought I'd ask to see if anyone had any suggestions, thoughts or comments.
>
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]