Gregory Chanan created SOLR-6137:
------------------------------------
Summary: Managed Schema / Schemaless and SolrCloud concurrency
issues
Key: SOLR-6137
URL: https://issues.apache.org/jira/browse/SOLR-6137
Project: Solr
Issue Type: Bug
Components: Schema and Analysis, SolrCloud
Reporter: Gregory Chanan
This is a follow up to a message on the mailing list, linked here:
http://mail-archives.apache.org/mod_mbox/lucene-dev/201406.mbox/%3CCAKfebOOcMeVEb010SsdcH8nta%3DyonMK5R7dSFOsbJ_tnre0O7w%40mail.gmail.com%3E
The Managed Schema integration with SolrCloud seems pretty limited.
The issue I'm running into is variants of the issue that schema changes are not
pushed to all shards/replicas synchronously. So, for example, I can make the
following two requests:
1) add a field to the collection on server1 using the Schema API
2) add a document with the new field, the document is routed to a core on
server2
Then, there appears to be a race between when the document is processed by the
core on server2 and when the core on server2, via the ZkIndexSchemaReader, gets
the new schema. If the document is processed first, I get a 400 error because
the field doesn't exist. This is easily reproducible by adding a sleep to the
ZkIndexSchemaReader's processing.
I hit a similar issue with Schemaless: the distributed request handler sends
out the document updates, but there is no guarantee that the other
shards/replicas see the schema changes made by the update.chain.
Another issue I noticed today: making multiple schema API calls concurrently
can block; that is, one may get through and the other may infinite loop.
So, for reference, the issues include:
1) Schema API changes return success before all cores are updated; subsequent
calls attempting to use new schema may fail
2) Schemaless changes may fail on replicas/other shards for the same reason
3) Concurrent Schema API changes may block
>From Steve Rowe on the mailing list:
{quote}
For Schema API users, delaying a couple of seconds after adding fields before
using them should workaround this problem. While not ideal, I think schema
field additions are rare enough in the Solr collection lifecycle that this is
not a huge problem.
For schemaless users, the picture is worse, as you noted. Immediate
distribution of documents triggering schema field addition could easily prove
problematic. Maybe we need a schema update blocking mode, where after the ZK
schema node watch is triggered, all new request processing is halted until the
schema is finished downloading/parsing/swapping out? (Such a mode should help
Schema API users too.)
{quote}
--
This message was sent by Atlassian JIRA
(v6.2#6252)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]