Gregory Chanan created SOLR-6137:
------------------------------------

             Summary: Managed Schema / Schemaless and SolrCloud concurrency 
issues
                 Key: SOLR-6137
                 URL: https://issues.apache.org/jira/browse/SOLR-6137
             Project: Solr
          Issue Type: Bug
          Components: Schema and Analysis, SolrCloud
            Reporter: Gregory Chanan


This is a follow up to a message on the mailing list, linked here: 
http://mail-archives.apache.org/mod_mbox/lucene-dev/201406.mbox/%3CCAKfebOOcMeVEb010SsdcH8nta%3DyonMK5R7dSFOsbJ_tnre0O7w%40mail.gmail.com%3E

The Managed Schema integration with SolrCloud seems pretty limited.

The issue I'm running into is variants of the issue that schema changes are not 
pushed to all shards/replicas synchronously.  So, for example, I can make the 
following two requests:
1) add a field to the collection on server1 using the Schema API
2) add a document with the new field, the document is routed to a core on 
server2

Then, there appears to be a race between when the document is processed by the 
core on server2 and when the core on server2, via the ZkIndexSchemaReader, gets 
the new schema.  If the document is processed first, I get a 400 error because 
the field doesn't exist.  This is easily reproducible by adding a sleep to the 
ZkIndexSchemaReader's processing.

I hit a similar issue with Schemaless: the distributed request handler sends 
out the document updates, but there is no guarantee that the other 
shards/replicas see the schema changes made by the update.chain.

Another issue I noticed today: making multiple schema API calls concurrently 
can block; that is, one may get through and the other may infinite loop.

So, for reference, the issues include:
1) Schema API changes return success before all cores are updated; subsequent 
calls attempting to use new schema may fail
2) Schemaless changes may fail on replicas/other shards for the same reason
3) Concurrent Schema API changes may block

>From Steve Rowe on the mailing list:
{quote}
For Schema API users, delaying a couple of seconds after adding fields before 
using them should workaround this problem.  While not ideal, I think schema 
field additions are rare enough in the Solr collection lifecycle that this is 
not a huge problem.

For schemaless users, the picture is worse, as you noted.  Immediate 
distribution of documents triggering schema field addition could easily prove 
problematic.  Maybe we need a schema update blocking mode, where after the ZK 
schema node watch is triggered, all new request processing is halted until the 
schema is finished downloading/parsing/swapping out? (Such a mode should help 
Schema API users too.)
{quote}



--
This message was sent by Atlassian JIRA
(v6.2#6252)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to