[ https://issues.apache.org/jira/browse/SOLR-15540?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17390024#comment-17390024 ]
Chris M. Hostetter commented on SOLR-15540: ------------------------------------------- in Solr7 the treatment of the {{_root_}} field (if it existed in the schema) was very inconsistent depending on wether any given document beeing adeed/updated included nested children or not. this was the cause of various bugs and inconsistencies with update and deleteById which were fixed in 8.x — but it appears this “fix” didn’t account for the possibility of: * upgrading from 7.x … * … w/schema that includes {{_root_}} field … * … but the {{_root_}} field doesn’t exist in some/all documents I can reproduce the described problem using the 7.7.2 techproducts configs (AFAICT the {{_default}} configs from 7.7.2 should be equally affected) … details below. IIUC there are 4 possible scenerios for people upgrading from 7x to 8x… # Your existing schema doesn’t include a {{_root_}} field ** you should be unaffected by this problem # Your existing schema includes a {{_root_}} field, and all documents in your collection are nested documents (ie: every document has a value in the {{_root_}} field) ** you should be unaffected by this problem # Your existing schema includes a {{_root_}} field, but you have no nested documents in your collection (ie: no document has any value in the {{_root_}} field) ** You should be able to work-around this problem by removing the {{_root_}} field just before or just after upgrading to solr 8 ** if you’ve already updated some documents before removing the {{_root_}} field you may need to re-updated/delete the duplicates manually # Your existing schema includes a {{_root_}} field, and some of your documents are nested, but some documents have no children (ie: some docs have values in in the {{_root_}} field, while other docs do not) ** I don’t think there is any work-around for this situation except to deleteByQuery to remove all the docs w/o children (either before or after upgrading to 8.x) and then re-add them after upgrading Example of reproducing this problem, and demonstrating the work around of removing the {{_root_}} field… {code:java} ### 7.7.2 create techproducts example w/docs... hossman@slate:~/lucene/8x/solr [j8] [tags/releases/lucene-solr/7.7.2] $ bin/solr -e techproducts ... hossman@slate:~/lucene/8x/solr [j8] [tags/releases/lucene-solr/7.7.2] $ bin/solr stop -all Sending stop command to Solr running on port 8983 ... waiting up to 180 seconds to allow Jetty process 15596 to stop gracefully. ### save our solr home for later re-use in "upgrade"... hossman@slate:~/lucene/8x/solr [j8] [tags/releases/lucene-solr/7.7.2] $ cp -r example/techproducts/solr/ /tmp/solr-home ### Now start solr 8.8.2 using the solr-home from 7.7.2 .... hossman@slate:~/lucene/8x/solr [j8] [tags/releases/lucene-solr/8.8.2] $ bin/solr -s /tmp/solr-home Waiting up to 180 seconds to see Solr running on port 8983 [|] Started Solr server on port 8983 (pid=17201). Happy searching! ### confirm we have one doc named "solr" and get it's uniqueKey... hossman@slate:~/lucene/8x/solr [j8] [tags/releases/lucene-solr/8.8.2] $ curl 'http://localhost:8983/solr/techproducts/select?q=name:solr&fl=id,name' { "responseHeader":{ "status":0, "QTime":1, "params":{ "q":"name:solr", "fl":"id,name,_root_"}}, "response":{"numFound":1,"start":0,"numFoundExact":true,"docs":[ { "id":"SOLR1000", "name":"Solr, the Enterprise Search Server"}] }} ### attempt to update this document and see the bug manifest... hossman@slate:~/lucene/8x/solr [j8] [tags/releases/lucene-solr/8.8.2] $ curl 'http://localhost:8983/solr/techproducts/update/json?commit=true' --data-binary '[{"id":"SOLR1000", "name":"Solr name changed"}]' { "responseHeader":{ "status":0, "QTime":152}} hossman@slate:~/lucene/8x/solr [j8] [tags/releases/lucene-solr/8.8.2] $ curl 'http://localhost:8983/solr/techproducts/select?q=name:solr&fl=id,name' { "responseHeader":{ "status":0, "QTime":3, "params":{ "q":"name:solr", "fl":"id,name,_root_"}}, "response":{"numFound":2,"start":0,"numFoundExact":true,"docs":[ { "id":"SOLR1000", "name":"Solr name changed"}, { "id":"SOLR1000", "name":"Solr, the Enterprise Search Server"}] }} ### now we have 2 docs with same uniqueKey ### Workaround by removing the (unneeded) _root_ field from schema, ### and update the doc again (will "overwrite" both existing docs) hossman@slate:~/lucene/8x/solr [j8] [tags/releases/lucene-solr/8.8.2] $ curl -X POST -H 'Content-type:application/json' --data-binary '{ "delete-field":{"name":"_root_"} }' http://localhost:8983/solr/techproducts/schema { "responseHeader":{ "status":0, "QTime":257}} hossman@slate:~/lucene/8x/solr [j8] [tags/releases/lucene-solr/8.8.2] $ curl 'http://localhost:8983/solr/techproducts/select?q=name:solr&fl=id,name' { "responseHeader":{ "status":0, "QTime":1, "params":{ "q":"name:solr", "fl":"id,name"}}, "response":{"numFound":2,"start":0,"numFoundExact":true,"docs":[ { "id":"SOLR1000", "name":"Solr name changed"}, { "id":"SOLR1000", "name":"Solr, the Enterprise Search Server"}] }} hossman@slate:~/lucene/8x/solr [j8] [tags/releases/lucene-solr/8.8.2] $ curl 'http://localhost:8983/solr/techproducts/update/json?commit=true' --data-binary '[{"id":"SOLR1000", "name":"Solr name changed after root field removed"}]' { "responseHeader":{ "status":0, "QTime":63}} hossman@slate:~/lucene/8x/solr [j8] [tags/releases/lucene-solr/8.8.2] $ curl 'http://localhost:8983/solr/techproducts/select?q=name:solr&fl=id,name' { "responseHeader":{ "status":0, "QTime":1, "params":{ "q":"name:solr", "fl":"id,name"}}, "response":{"numFound":1,"start":0,"numFoundExact":true,"docs":[ { "id":"SOLR1000", "name":"Solr name changed after root field removed"}] }} {code} > Duplicated adding document for update when Solr7 upgrade to Solr8 > ----------------------------------------------------------------- > > Key: SOLR-15540 > URL: https://issues.apache.org/jira/browse/SOLR-15540 > Project: Solr > Issue Type: Bug > Security Level: Public(Default Security Level. Issues are Public) > Components: update, UpdateRequestProcessors > Affects Versions: 8.8.2 > Environment: SolrCloud Solr 8.8.2 > Reporter: samuel ma > Priority: Major > > We upgrade Solr7.7.2 to Solr8.8.2, keep using the Solr 7 index data, the > query operation is fine. But when we try to add the doc (with the same doc > id, actually is an update operation) to Solr8, we actually see 2 doc with the > same id, which means the update did not remove the Solr7 doc. > Below is the schema.xml configuration for Solr7. > {code:java} > <field name="id" type="string" indexed="true" stored="true" required="true" > multiValued="false" /> > <uniqueKey>id</uniqueKey> > <field name="_root_" type="string" indexed="true" stored="false"/> > {code} > > Add some fields in Solr8 > <field name="_nest_path_" type="_nest_path_" stored="true"/> > <fieldType name="_nest_path_" class="solr.NestPathField" /> > > I can see in Sol7 code > {code:java} > DirectUpdateHandler2.updateDocOrDocValues{code} > use the idTerm as updateTerm, but in this case Solr8 use rootTerm as the > updateTerm. is this an expected behavior? how do we handle this incompatible > issue? > > Add comment: > This also impacts deletebyId > -- This message was sent by Atlassian Jira (v8.3.4#803005) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@solr.apache.org For additional commands, e-mail: issues-h...@solr.apache.org