[
https://issues.apache.org/jira/browse/SOLR-6137?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14062508#comment-14062508
]
Gregory Chanan commented on SOLR-6137:
--------------------------------------
Thanks [[email protected]]! Your changes make sense.
bq. Schema API changes return success before all cores are updated; subsequent
calls attempting to use new schema may fail
I filed SOLR-6249 for this.
bq. One small issue I noticed is that there is a race between parsing and
schema addition.
I filed SOLR-6250 for this
bq. Anything else?
Nope.
> Managed Schema / Schemaless and SolrCloud concurrency issues
> ------------------------------------------------------------
>
> Key: SOLR-6137
> URL: https://issues.apache.org/jira/browse/SOLR-6137
> Project: Solr
> Issue Type: Bug
> Components: Schema and Analysis, SolrCloud
> Reporter: Gregory Chanan
> Attachments: SOLR-6137.patch, SOLR-6137.patch, SOLR-6137v2.patch,
> SOLR-6137v3.patch, SOLR-6137v4.patch
>
>
> This is a follow up to a message on the mailing list, linked here:
> http://mail-archives.apache.org/mod_mbox/lucene-dev/201406.mbox/%3CCAKfebOOcMeVEb010SsdcH8nta%3DyonMK5R7dSFOsbJ_tnre0O7w%40mail.gmail.com%3E
> The Managed Schema integration with SolrCloud seems pretty limited.
> The issue I'm running into is variants of the issue that schema changes are
> not pushed to all shards/replicas synchronously. So, for example, I can make
> the following two requests:
> 1) add a field to the collection on server1 using the Schema API
> 2) add a document with the new field, the document is routed to a core on
> server2
> Then, there appears to be a race between when the document is processed by
> the core on server2 and when the core on server2, via the
> ZkIndexSchemaReader, gets the new schema. If the document is processed
> first, I get a 400 error because the field doesn't exist. This is easily
> reproducible by adding a sleep to the ZkIndexSchemaReader's processing.
> I hit a similar issue with Schemaless: the distributed request handler sends
> out the document updates, but there is no guarantee that the other
> shards/replicas see the schema changes made by the update.chain.
> Another issue I noticed today: making multiple schema API calls concurrently
> can block; that is, one may get through and the other may infinite loop.
> So, for reference, the issues include:
> 1) Schema API changes return success before all cores are updated; subsequent
> calls attempting to use new schema may fail
> 2) Schemaless changes may fail on replicas/other shards for the same reason
> 3) Concurrent Schema API changes may block
> From Steve Rowe on the mailing list:
> {quote}
> For Schema API users, delaying a couple of seconds after adding fields before
> using them should workaround this problem. While not ideal, I think schema
> field additions are rare enough in the Solr collection lifecycle that this is
> not a huge problem.
> For schemaless users, the picture is worse, as you noted. Immediate
> distribution of documents triggering schema field addition could easily prove
> problematic. Maybe we need a schema update blocking mode, where after the ZK
> schema node watch is triggered, all new request processing is halted until
> the schema is finished downloading/parsing/swapping out? (Such a mode should
> help Schema API users too.)
> {quote}
--
This message was sent by Atlassian JIRA
(v6.2#6252)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]