ZK Connection Failure leads to stale data

Dennis Gove Wed, 10 Feb 2016 08:57:03 -0800

Just wanted to take a moment to get anyone's thoughts on the following
issues


https://issues.apache.org/jira/browse/SOLR-8599
https://issues.apache.org/jira/browse/SOLR-8666

The originating problem occurred due to a DNS failure that caused some
nodes in a cloud setup to fail to connect to zookeeper. Those nodes were
running but were not participating in the cloud with the other nodes. The
disconnected nodes would respond to queries with stale data, though they
would reject injest requests.

Ticket https://issues.apache.org/jira/browse/SOLR-8599 contains a patch
which ensures that if a connection to zookeeper fails to be made it will be
retried. Previously the failure wasn't leading to a retry so the node would
just run and be disconnect until the node itself was restarted.

Ticket https://issues.apache.org/jira/browse/SOLR-8666 contains a patch
which will result in additional information returned to the client when a
node may be returning stale data due to not being connected to zookeeper.
The intent was to not change current behavior but allow the client to know
that something might be wrong. In situations where the collection is not
being updated the data may not be stale so it wouldn't matter if the node
is disconnected from zookeeper but in situations where the collection is
being updated then the data may be stale. The headers of the response will
now contain an entry to indicate this. Also, adds a header to the ping
response to also provide notification if the node is disconnected from
zookeeper.

I think the approach these patches take are good but wanted to get others'
thoughts and perhaps I'm missing a case where these might cause a problem.

Thanks - Dennis

ZK Connection Failure leads to stale data

Reply via email to