Hi Calvin,

Thanks for this very comprehensive issue report!

Yes, a JIRA ticket for this sounds like the next logical step here then.

And I'd be curious if perhaps we can expand the existing test coverage to model 
this scenario, maybe in 
https://github.com/apache/solr/blob/main/solr/core/src/test/org/apache/solr/update/processor/AtomicUpdatesTest.java
 or 
https://github.com/apache/solr/blob/main/solr/core/src/test/org/apache/solr/update/processor/AbstractAtomicUpdatesMultivalueTestBase.java
 or so.

Best wishes,
Christine

From: users@solr.apache.org At: 05/01/24 20:47:24 UTC+1:00To:  
solr-u...@lucene.apache.org
Subject: json atomic update error when using multiple modifiers (e.g., {"id": 
"1", "myfield": {"add": "x", "remove": "y"}})

Dear Solr Users,

I ran into a problem doing a json atomic update that tries to both `add`
and `remove`
a value for a multivalued field in a single update. I saw it initially in
an instance that runs
9.5.0, and reproduced a minimal example using Solr 9.6.0.

The only fields defined in the schema are:

  <field name="id" type="string" multiValued="false" indexed="true"
required="true" stored="true"/>
  <field name="_version_" type="plong" indexed="false" stored="false"/>
  <field name="name" type="strings" indexed="true" stored="true"/>

`<uniqueKey>id</uniqueKey>` is also present, so I'm supplying docs with
just an `id` and a multivalued `name` field. The real setup is more complex,
but this is a minimal test case to illustrate the problem.

Starting with an empty index, I add the following doc to the index
successfully:

{"id": "1", "name": ["John Doe", "Jane Doe"]}

And I can query it, seeing the expected result:

{
  "responseHeader":{
    "status":0,
    "QTime":23,
    "params":{
      "q":"name:*"
    }
  },
  "response":{
    "numFound":1,
    "start":0,
    "numFoundExact":true,
    "docs":[{
      "id":"1",
      "name":["John Doe","Jane Doe"],
      "_version_":1797873599884820480
    }]
  }
}

Next, I send an atomic update to modify the `name` field of that document
by removing
`Jane Doe` and adding `Janet Doe`:

{"id": "1", "name": {"add": "Janet Doe", "remove": "Jane Doe"}}

This atomic update that does both an `add` and a `remove` is something that
used to
work for us under Solr 6.6, but we just noticed that it fails in our
production 9.5
instance and in 9.6, which I just downloaded to test against.

The error in the solr.log indicates that Solr mistakenly interprets the
`{"add": "Janet Doe", "remove": "Jane Doe"}` as a nested document and then
throws an exception because our schema doesn't have the  `_root_` field that
would be expected if we were using nested docs (which we don't use).

Here's the full stack trace from `solr.log` (version 9.6.0):

2024-05-01 17:49:02.479 ERROR (qtp2059461664-30-0.0.0.0-3) [c: s: r:
x:atris t:0.0.0.0-3] o.a.s.h.RequestHandlerBase Client exception =>
org.apache.solr.common.SolrException: Unable to index docs with children:
the schema must include definitions for both a uniqueKey field and the
'_root_' field, using the exact same fieldType
        at
org.apache.solr.update.DocumentBuilder.unexpectedNestedDocException(DocumentBuil
der.java:369)
org.apache.solr.common.SolrException: Unable to index docs with children:
the schema must include definitions for both a uniqueKey field and the
'_root_' field, using the exact same fieldType
        at
org.apache.solr.update.DocumentBuilder.unexpectedNestedDocException(DocumentBuil
der.java:369)
~[?:?]
        at
org.apache.solr.update.DocumentBuilder.toDocument(DocumentBuilder.java:153)
~[?:?]
        at
org.apache.solr.update.AddUpdateCommand.makeLuceneDocs(AddUpdateCommand.java:213
)
~[?:?]
        at
org.apache.solr.update.DirectUpdateHandler2.updateDocOrDocValues(DirectUpdateHan
dler2.java:1056)
~[?:?]
        at
org.apache.solr.update.DirectUpdateHandler2.doNormalUpdate(DirectUpdateHandler2.
java:421)
~[?:?]
        at
org.apache.solr.update.DirectUpdateHandler2.addDoc0(DirectUpdateHandler2.java:37
5)
~[?:?]
        at
org.apache.solr.update.DirectUpdateHandler2.addDoc(DirectUpdateHandler2.java:312
)
~[?:?]
        at
org.apache.solr.update.processor.RunUpdateProcessorFactory$RunUpdateProcessor.pr
ocessAdd(RunUpdateProcessorFactory.java:76)
~[?:?]
        at
org.apache.solr.update.processor.UpdateRequestProcessor.processAdd(UpdateRequest
Processor.java:54)
~[?:?]
        at
org.apache.solr.update.processor.DistributedUpdateProcessor.doLocalAdd(Distribut
edUpdateProcessor.java:270)
~[?:?]
        at
org.apache.solr.update.processor.DistributedUpdateProcessor.doVersionAdd(Distrib
utedUpdateProcessor.java:533)
~[?:?]
        at
org.apache.solr.update.processor.DistributedUpdateProcessor.lambda$versionAdd$0(
DistributedUpdateProcessor.java:358)
~[?:?]
        at
org.apache.solr.update.VersionBucket.runWithLock(VersionBucket.java:43)
~[?:?]
        at
org.apache.solr.update.processor.DistributedUpdateProcessor.versionAdd(Distribut
edUpdateProcessor.java:355)
~[?:?]
        at
org.apache.solr.update.processor.DistributedUpdateProcessor.processAdd(Distribut
edUpdateProcessor.java:236)
~[?:?]
        at
org.apache.solr.update.processor.LogUpdateProcessorFactory$LogUpdateProcessor.pr
ocessAdd(LogUpdateProcessorFactory.java:111)
~[?:?]
        at
org.apache.solr.handler.loader.JsonLoader$SingleThreadedJsonLoader.handleAdds(Js
onLoader.java:553)
~[?:?]
        at
org.apache.solr.handler.loader.JsonLoader$SingleThreadedJsonLoader.processUpdate
(JsonLoader.java:183)
~[?:?]
        at
org.apache.solr.handler.loader.JsonLoader$SingleThreadedJsonLoader.load(JsonLoad
er.java:151)
~[?:?]
        at
org.apache.solr.handler.loader.JsonLoader.load(JsonLoader.java:86) ~[?:?]
        at
org.apache.solr.handler.UpdateRequestHandler$1.load(UpdateRequestHandler.java:10
2)
~[?:?]
        at
org.apache.solr.handler.ContentStreamHandlerBase.handleRequestBody(ContentStream
HandlerBase.java:100)
~[?:?]
        at
org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java
:226)
~[?:?]
        at org.apache.solr.core.SolrCore.execute(SolrCore.java:2886) ~[?:?]
        at
org.apache.solr.servlet.HttpSolrCall.executeCoreRequest(HttpSolrCall.java:910)
~[?:?]
        at org.apache.solr.servlet.HttpSolrCall.call(HttpSolrCall.java:596)
~[?:?]
        at
org.apache.solr.servlet.SolrDispatchFilter.dispatch(SolrDispatchFilter.java:262)
~[?:?]
        at
org.apache.solr.servlet.SolrDispatchFilter.lambda$doFilter$0(SolrDispatchFilter.
java:219)
~[?:?]
        at
org.apache.solr.servlet.ServletUtils.traceHttpRequestExecution2(ServletUtils.jav
a:249)
~[?:?]
        at
org.apache.solr.servlet.ServletUtils.rateLimitRequest(ServletUtils.java:215)
~[?:?]
        at
org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:213)
~[?:?]
        at
org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:195)
~[?:?]
        at
org.eclipse.jetty.servlet.FilterHolder.doFilter(FilterHolder.java:210)
~[jetty-servlet-10.0.20.jar:10.0.20]
        at
org.eclipse.jetty.servlet.ServletHandler$Chain.doFilter(ServletHandler.java:1635
)
~[jetty-servlet-10.0.20.jar:10.0.20]
        at
org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:527)
~[jetty-servlet-10.0.20.jar:10.0.20]
        at
org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:131)
~[jetty-server-10.0.20.jar:10.0.20]
        at
org.eclipse.jetty.security.SecurityHandler.handle(SecurityHandler.java:598)
~[jetty-security-10.0.20.jar:10.0.20]
        at
org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:122)
~[jetty-server-10.0.20.jar:10.0.20]
        at
org.eclipse.jetty.server.handler.ScopedHandler.nextHandle(ScopedHandler.java:223
)
~[jetty-server-10.0.20.jar:10.0.20]
        at
org.eclipse.jetty.server.session.SessionHandler.doHandle(SessionHandler.java:158
0)
~[jetty-server-10.0.20.jar:10.0.20]
        at
org.eclipse.jetty.server.handler.ScopedHandler.nextHandle(ScopedHandler.java:221
)
~[jetty-server-10.0.20.jar:10.0.20]
        at
org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:138
4)
~[jetty-server-10.0.20.jar:10.0.20]
        at
org.eclipse.jetty.server.handler.ScopedHandler.nextScope(ScopedHandler.java:176)
~[jetty-server-10.0.20.jar:10.0.20]
        at
org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:484)
~[jetty-servlet-10.0.20.jar:10.0.20]
        at
org.eclipse.jetty.server.session.SessionHandler.doScope(SessionHandler.java:1553
)
~[jetty-server-10.0.20.jar:10.0.20]
        at
org.eclipse.jetty.server.handler.ScopedHandler.nextScope(ScopedHandler.java:174)
~[jetty-server-10.0.20.jar:10.0.20]
        at
org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:1306
)
~[jetty-server-10.0.20.jar:10.0.20]
        at
org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:129)
~[jetty-server-10.0.20.jar:10.0.20]
        at
org.eclipse.jetty.server.handler.ContextHandlerCollection.handle(ContextHandlerC
ollection.java:149)
~[jetty-server-10.0.20.jar:10.0.20]
        at
org.eclipse.jetty.server.handler.InetAccessHandler.handle(InetAccessHandler.java
:228)
~[jetty-server-10.0.20.jar:10.0.20]
        at
org.eclipse.jetty.server.handler.HandlerCollection.handle(HandlerCollection.java
:141)
~[jetty-server-10.0.20.jar:10.0.20]
        at
org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:122)
~[jetty-server-10.0.20.jar:10.0.20]
        at
org.eclipse.jetty.rewrite.handler.RewriteHandler.handle(RewriteHandler.java:301)
~[jetty-rewrite-10.0.20.jar:10.0.20]
        at
org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:122)
~[jetty-server-10.0.20.jar:10.0.20]
        at
org.eclipse.jetty.server.handler.gzip.GzipHandler.handle(GzipHandler.java:822)
~[jetty-server-10.0.20.jar:10.0.20]
        at
org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:122)
~[jetty-server-10.0.20.jar:10.0.20]
        at org.eclipse.jetty.server.Server.handle(Server.java:563)
~[jetty-server-10.0.20.jar:10.0.20]
        at
org.eclipse.jetty.server.HttpChannel$RequestDispatchable.dispatch(HttpChannel.ja
va:1598)
~[jetty-server-10.0.20.jar:10.0.20]
        at
org.eclipse.jetty.server.HttpChannel.dispatch(HttpChannel.java:753)
~[jetty-server-10.0.20.jar:10.0.20]
        at
org.eclipse.jetty.server.HttpChannel.handle(HttpChannel.java:501)
~[jetty-server-10.0.20.jar:10.0.20]
        at
org.eclipse.jetty.server.HttpConnection.onFillable(HttpConnection.java:287)
~[jetty-server-10.0.20.jar:10.0.20]
        at
org.eclipse.jetty.io.AbstractConnection$ReadCallback.succeeded(AbstractConnectio
n.java:314)
~[jetty-io-10.0.20.jar:10.0.20]
        at
org.eclipse.jetty.io.FillInterest.fillable(FillInterest.java:100)
~[jetty-io-10.0.20.jar:10.0.20]
        at
org.eclipse.jetty.io.SelectableChannelEndPoint$1.run(SelectableChannelEndPoint.j
ava:53)
~[jetty-io-10.0.20.jar:10.0.20]
        at
org.eclipse.jetty.util.thread.strategy.AdaptiveExecutionStrategy.runTask(Adaptiv
eExecutionStrategy.java:421)
~[jetty-util-10.0.20.jar:10.0.20]
        at
org.eclipse.jetty.util.thread.strategy.AdaptiveExecutionStrategy.consumeTask(Ada
ptiveExecutionStrategy.java:390)
~[jetty-util-10.0.20.jar:10.0.20]
        at
org.eclipse.jetty.util.thread.strategy.AdaptiveExecutionStrategy.tryProduce(Adap
tiveExecutionStrategy.java:277)
~[jetty-util-10.0.20.jar:10.0.20]
        at
org.eclipse.jetty.util.thread.strategy.AdaptiveExecutionStrategy.produce(Adaptiv
eExecutionStrategy.java:193)
~[jetty-util-10.0.20.jar:10.0.20]
        at
org.eclipse.jetty.util.thread.QueuedThreadPool.runJob(QueuedThreadPool.java:969)
~[jetty-util-10.0.20.jar:10.0.20]
        at
org.eclipse.jetty.util.thread.QueuedThreadPool$Runner.doRunJob(QueuedThreadPool.
java:1194)
~[jetty-util-10.0.20.jar:10.0.20]
        at
org.eclipse.jetty.util.thread.QueuedThreadPool$Runner.run(QueuedThreadPool.java:
1149)
~[jetty-util-10.0.20.jar:10.0.20]
        at java.base/java.lang.Thread.run(Thread.java:829) [?:?]

In case it matters, the update handler was called with `commitWithin=1000`.

I believe this is happening because as part of commit 5a9a34daedb3 on
2021-06-12
("SOLR-15044: JSON Loading: nested docs don't need ID"), the `isChildDoc`
method
of solr/core/src/java/org/apache/solr/handler/loader/JsonLoader.java was
changed:

+    /** Is this a child doc (true) or a partial update (false)? */
     private boolean isChildDoc(SolrInputDocument extendedFieldValue) {
-      return
extendedFieldValue.containsKey(req.getSchema().getUniqueKeyField().getName());
+      if (extendedFieldValue.size() != 1) {
+        return true;
+      }
+      // if the only key is a field in the schema, assume it's a child doc
+      final String fieldName =
extendedFieldValue.iterator().next().getName();
+      return req.getSchema().getFieldOrNull(fieldName) != null;
+      // otherwise, assume it's "set" or some other verb for a partial
update.
+      // NOTE: it's fundamentally ambiguous with JSON; this is a best
effort try.
     }

In case that diff gets reformatted and garbled, it would previously have
checked
if the `id` field was a  key in the `extendedFieldValue`, which would have
been `false`, but it now returns `true` if the `extendedFieldValue.size()`
is not
equal to `1`, which is the case here because I have both an `add` and a
`remove`
in the extended field value. It does work fine if I submit two separate
updates, one for each of the two operations.

For now, we're going to do the `add` and `remove` in separate
updates and make sure we don't have other cases of multiple
update operations, but it would be nice if users weren't forced to do that.

Let me know if I should create a Jira issue for this, or if any
other information would be helpful.

This was actually the second time we were affected by the `isChildDoc`
change I mentioned above. The first issue, when we upgraded from 6.6 to 9.5
last year,
we worked around by renaming an existing field from `set` to `sset`. The
problem
was that an atomic update like `{"id": "1", "name": {"set": ["J Doe"]}}`
would fail with the same error message because since `set` was a field name,
`return req.getSchema().getFieldOrNull(fieldName) != null` would evaluate
to `true`
for `isChildDoc`. After we renamed the field to `sset`, then
`req.getSchema().getFieldOrNull(fieldName)` would be null for `set` and so
`isChildDoc` would evaluate to `false`, as desired.

Thanks,
Calvin


Reply via email to