[
https://issues.apache.org/jira/browse/SOLR-445?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Hoss Man updated SOLR-445:
--------------------------
Attachment: SOLR-445.patch
I started playing arround with this patch a bit to see if I could help move it
forward. I'm a little out of my depth with a lot of the details of how
distribute updates work, but the more I tried to make sense of it, the more
convinced I was that there was a lot of things that just weren't very well
accounted for in the existing tests (which were consistently failing, but the
failures themselves weren't consistent between runs).
Here's a summary of what's new/different in the patch i'm attaching...
* DistributedUpdateProcessor.DistribPhase
** not sure why this enum was made non-static in earlier patches ... i reverted
this unneeded change.
* TolerantUpdateProcessor
** processDelete
*** Method has a couple of glaringly obvious bugs, that aparently don't trip
under the current tests
*** added several nocommits of things that jumpted out at me
* DistribTolerantUpdateProcessorTest
** beefed up assertion msgs in assertUSucceedsWithErrors
** fixed testValidAdds so it's not dead code
** testInvalidAdds
*** sanity check code wasn't passing reliably
**** details of what failed are lost depending on how update is routed (random
seed)
**** relaxed this check to be reliable with a nocommit comment to see if we can
tighten it up
*** assuming sanity check passes assertUSucceedsWithErrors (still) fails on
some seeds w/null error list
**** I'm Guessing this is what anshum alluded to in last comment: "Node2 as of
now return an HTTP OK and doesn't throw an exception, the StreamingSolrClient
used but the Distributed Updated Processor doesn't realize the error that was
consumed by the leader of shard 1"
* TestTolerantUpdateProcessorCloud
** New MiniSolrCloudCluster based test to try and demonstrate all the possible
distrib code paths i could think of (see below)
TestTolerantUpdateProcessorCloud is the real meat of what i've added here.
Starting with the basic behavior/assertions currently tested in
TolerantUpdateProcessorTest, I built it up to try and exorcise every possible
distribute update code path i could imagine (updates with docs all on one shard
some of which fail, updates with docs for diff shards and some from each shard
fail, updates with docs for diff shards but only one shard fails, etc...) --
but only tested against a MinSolrCloud collection that actaully had 1 node, 1
shard, 1 replica and an HttpSolrClient talking directly to that node. Once all
those assertions were passing, then I changed it to use 5 nodes, 2 shards, 2
replicas and started testing all of those scenerios against 5 HttpSolrClients
pointed at every individual node (one of which hosts no replicas) as well as a
ZK aware CloudSolrClient. All 6 tests against all 6 clients currently fail
(reliably) at some point in these scenerios.
----
Independent of all the things i still need to make sense of in the existing
code to try and help get these tests passing, I still have one big question
about what the desired/epected behavior should be for clients when maxErrors is
exceeded -- at the moment, in single node setups, the client gets a 400 error
with the top level "error" section corisponding with whatever error caused it
to exceed the maxErrors, but the responseHeader is still populated with the
individual errors and the appropraite numAdds & numErrors, for example...
{code}
$ curl -v -X POST
'http://localhost:8983/solr/techproducts/update?indent=true&commit=true&update.chain=tolerant'
-H 'Content-Type: application/json' --data-binary
'[{"id":"hoss1","foo_i":42},{"id":"bogus1","foo_i":"bogus"},{"id":"hoss2","foo_i":66},{"id":"bogus2","foo_i":"bogus"},{"id":"bogus3","foo_i":"bogus"},{"id":"hoss3","foo_i":42}]'
* Hostname was NOT found in DNS cache
* Trying 127.0.0.1...
* Connected to localhost (127.0.0.1) port 8983 (#0)
> POST /solr/techproducts/update?indent=true&commit=true&update.chain=tolerant
> HTTP/1.1
> User-Agent: curl/7.38.0
> Host: localhost:8983
> Accept: */*
> Content-Type: application/json
> Content-Length: 175
>
* upload completely sent off: 175 out of 175 bytes
< HTTP/1.1 400 Bad Request
< Content-Type: text/plain;charset=utf-8
< Transfer-Encoding: chunked
<
{
"responseHeader":{
"numErrors":3,
"errors":{
"bogus1":{
"message":"ERROR: [doc=bogus1] Error adding field 'foo_i'='bogus'
msg=For input string: \"bogus\""},
"bogus2":{
"message":"ERROR: [doc=bogus2] Error adding field 'foo_i'='bogus'
msg=For input string: \"bogus\""},
"bogus3":{
"message":"ERROR: [doc=bogus3] Error adding field 'foo_i'='bogus'
msg=For input string: \"bogus\""}},
"numAdds":2,
"status":400,
"QTime":4},
"error":{
"msg":"ERROR: [doc=bogus3] Error adding field 'foo_i'='bogus' msg=For input
string: \"bogus\"",
"code":400}}
* Connection #0 to host localhost left intact
{code}
...but because this is a 400 error, that means that if you use HttpSolrClient,
you're not going to get access to any of that detailed error information at all
-- you'll just get a RemoteSolrException with the bare details.
* Should the use of this processor force *all* "error" responses to be
rewritten as HTTP 200s?
* Should the solrj clients be updated so that RemoteSolrException still
provides an accessor to get the parsed/structured SolrResponse (assuming the
HTTP response body can be parsed w/o any other errors?)
?
> Update Handlers abort with bad documents
> ----------------------------------------
>
> Key: SOLR-445
> URL: https://issues.apache.org/jira/browse/SOLR-445
> Project: Solr
> Issue Type: Improvement
> Components: update
> Affects Versions: 1.3
> Reporter: Will Johnson
> Assignee: Anshum Gupta
> Attachments: SOLR-445-3_x.patch, SOLR-445-alternative.patch,
> SOLR-445-alternative.patch, SOLR-445-alternative.patch,
> SOLR-445-alternative.patch, SOLR-445.patch, SOLR-445.patch, SOLR-445.patch,
> SOLR-445.patch, SOLR-445.patch, SOLR-445.patch, SOLR-445.patch,
> SOLR-445.patch, SOLR-445_3x.patch, solr-445.xml
>
>
> Has anyone run into the problem of handling bad documents / failures mid
> batch. Ie:
> <add>
> <doc>
> <field name="id">1</field>
> </doc>
> <doc>
> <field name="id">2</field>
> <field name="myDateField">I_AM_A_BAD_DATE</field>
> </doc>
> <doc>
> <field name="id">3</field>
> </doc>
> </add>
> Right now solr adds the first doc and then aborts. It would seem like it
> should either fail the entire batch or log a message/return a code and then
> continue on to add doc 3. Option 1 would seem to be much harder to
> accomplish and possibly require more memory while Option 2 would require more
> information to come back from the API. I'm about to dig into this but I
> thought I'd ask to see if anyone had any suggestions, thoughts or comments.
>
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]