
Chris M. Hostetter commented on SOLR-15540:

in Solr7 the treatment of the {{_root_}} field (if it existed in the schema) 
was very inconsistent depending on wether any given document beeing 
adeed/updated included nested children or not. this was the cause of various 
bugs and inconsistencies with update and deleteById which were fixed in 8.x — 
but it appears this “fix” didn’t account for the possibility of:
 * upgrading from 7.x …

 * … w/schema that includes {{_root_}} field …

 * … but the {{_root_}} field doesn’t exist in some/all documents

I can reproduce the described problem using the 7.7.2 techproducts configs 
(AFAICT the {{_default}} configs from 7.7.2 should be equally affected) … 
details below.

IIUC there are 4 possible scenerios for people upgrading from 7x to 8x…
 # Your existing schema doesn’t include a {{_root_}} field
 ** you should be unaffected by this problem
 # Your existing schema includes a {{_root_}} field, and all documents in your 
collection are nested documents (ie: every document has a value in the 
{{_root_}} field)
 ** you should be unaffected by this problem
 # Your existing schema includes a {{_root_}} field, but you have no nested 
documents in your collection (ie: no document has any value in the {{_root_}} 
 ** You should be able to work-around this problem by removing the {{_root_}} 
field just before or just after upgrading to solr 8 
 ** if you’ve already updated some documents before removing the {{_root_}} 
field you may need to re-updated/delete the duplicates manually
 # Your existing schema includes a {{_root_}} field, and some of your documents 
are nested, but some documents have no children (ie: some docs have values in 
in the {{_root_}} field, while other docs do not)
 ** I don’t think there is any work-around for this situation except to 
deleteByQuery to remove all the docs w/o children (either before or after 
upgrading to 8.x) and then re-add them after upgrading


Example of reproducing this problem, and demonstrating the work around of 
removing the {{_root_}} field…

### 7.7.2 create techproducts example w/docs...

hossman@slate:~/lucene/8x/solr [j8] [tags/releases/lucene-solr/7.7.2] $ 
bin/solr -e techproducts
hossman@slate:~/lucene/8x/solr [j8] [tags/releases/lucene-solr/7.7.2] $ 
bin/solr stop -all
Sending stop command to Solr running on port 8983 ... waiting up to 180 seconds 
to allow Jetty process 15596 to stop gracefully.

### save our solr home for later re-use in "upgrade"...

hossman@slate:~/lucene/8x/solr [j8] [tags/releases/lucene-solr/7.7.2] $ cp -r 
example/techproducts/solr/ /tmp/solr-home

### Now start solr 8.8.2 using the solr-home from 7.7.2 ....

hossman@slate:~/lucene/8x/solr [j8] [tags/releases/lucene-solr/8.8.2] $ 
bin/solr -s /tmp/solr-home
Waiting up to 180 seconds to see Solr running on port 8983 [|] 
Started Solr server on port 8983 (pid=17201). Happy searching!

### confirm we have one doc named "solr" and get it's uniqueKey...

hossman@slate:~/lucene/8x/solr [j8] [tags/releases/lucene-solr/8.8.2] $ curl 
 "name":"Solr, the Enterprise Search Server"}]

### attempt to update this document and see the bug manifest...

hossman@slate:~/lucene/8x/solr [j8] [tags/releases/lucene-solr/8.8.2] $ curl 
'http://localhost:8983/solr/techproducts/update/json?commit=true' --data-binary 
'[{"id":"SOLR1000", "name":"Solr name changed"}]'
hossman@slate:~/lucene/8x/solr [j8] [tags/releases/lucene-solr/8.8.2] $ curl 
 "name":"Solr name changed"},
 "name":"Solr, the Enterprise Search Server"}]

### now we have 2 docs with same uniqueKey

### Workaround by removing the (unneeded) _root_ field from schema,
### and update the doc again (will "overwrite" both existing docs)

hossman@slate:~/lucene/8x/solr [j8] [tags/releases/lucene-solr/8.8.2] $ curl -X 
POST -H 'Content-type:application/json' --data-binary '{ 
"delete-field":{"name":"_root_"} }' 
hossman@slate:~/lucene/8x/solr [j8] [tags/releases/lucene-solr/8.8.2] $ curl 
 "name":"Solr name changed"},
 "name":"Solr, the Enterprise Search Server"}]
hossman@slate:~/lucene/8x/solr [j8] [tags/releases/lucene-solr/8.8.2] $ curl 
'http://localhost:8983/solr/techproducts/update/json?commit=true' --data-binary 
'[{"id":"SOLR1000", "name":"Solr name changed after root field removed"}]'
hossman@slate:~/lucene/8x/solr [j8] [tags/releases/lucene-solr/8.8.2] $ curl 
 "name":"Solr name changed after root field removed"}]


> Duplicated adding document for update when Solr7 upgrade to Solr8
> -----------------------------------------------------------------
>                 Key: SOLR-15540
>                 URL: https://issues.apache.org/jira/browse/SOLR-15540
>             Project: Solr
>          Issue Type: Bug
>      Security Level: Public(Default Security Level. Issues are Public) 
>          Components: update, UpdateRequestProcessors
>    Affects Versions: 8.8.2
>         Environment: SolrCloud Solr 8.8.2
>            Reporter: samuel ma
>            Priority: Major
> We upgrade Solr7.7.2 to Solr8.8.2, keep using the Solr 7 index data, the 
> query operation is fine. But when we try to add the doc (with the same doc 
> id, actually is an update operation) to Solr8, we actually see 2 doc with the 
> same id, which means the update did not remove the Solr7 doc.
> Below is the schema.xml configuration for Solr7.
> {code:java}
> <field name="id" type="string" indexed="true" stored="true" required="true" 
> multiValued="false" />
> <uniqueKey>id</uniqueKey>
> <field name="_root_" type="string" indexed="true" stored="false"/>
> {code}
> Add some fields in Solr8
> <field name="_nest_path_" type="_nest_path_" stored="true"/>
> <fieldType name="_nest_path_" class="solr.NestPathField" />
> I can see in Sol7 code
> {code:java}
> DirectUpdateHandler2.updateDocOrDocValues{code}
> use the idTerm as updateTerm, but in this case Solr8 use rootTerm as the 
> updateTerm. is this an expected behavior? how do we handle this incompatible 
> issue? 
> Add comment:
> This also impacts deletebyId  

This message was sent by Atlassian Jira

To unsubscribe, e-mail: issues-unsubscr...@solr.apache.org
For additional commands, e-mail: issues-h...@solr.apache.org

Reply via email to