[ https://issues.apache.org/jira/browse/SOLR-17725?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17940243#comment-17940243 ]
Rahul Goswami commented on SOLR-17725: -------------------------------------- Attached document outlines an example where the upgrade tool works on an index originally created in Solr 7.x, AFTER an upgrade to Solr 8.x. Key points: 1) Lucene version X can read index created in version X-1. Writing of new segments happens with the latest version codec. 2) When a segment merge happens, the segment maintains a version stamp "minVersion" which is the least version of the segment participating in a merge. 3) The segments_* file in a Lucene index maintains the Lucene version where the index was first created. The design doc outlines the process of converting all segments to the new version. It's sort of a pull model where you first upgrade and then "pull" the index to the current version. By the end of the process outlined in the doc, all segments get converted to the new version and the index in all respects is an "upgraded" index. The only missing piece is to update the index creation version in the commit point. I did this by exposing a method in Lucene's CommitInfos which validates the version of all segments and updates the creation version stamp in the commit point (we might need to request an API from Lucene here). When this index is opened in Solr 9.x, it can read this index (thanks to point #1) and the same process repeats to make the index ready for Solr 10.x. > Automatically upgrade Solr indexes without needing to reindex from source > ------------------------------------------------------------------------- > > Key: SOLR-17725 > URL: https://issues.apache.org/jira/browse/SOLR-17725 > Project: Solr > Issue Type: Improvement > Reporter: Rahul Goswami > Priority: Major > Attachments: High Level Design.png > > > Today upgrading from Solr version X to X+2 requires complete reingestion of > data from source. This comes from Lucene's constraint which only guarantees > index compatibility between the version the index was created in and the > immediate next version. > This reindexing usually comes with added downtime and/or cost. Especially in > case of deployments which are in customer environments and not completely in > control of the vendor, this proposition of having to completely reindex the > data can become a hard sell. > I, on behalf of my employer, Commvault, have developed a way which achieves > this reindexing in-place on the same index. Also, the process automatically > keeps "upgrading" the indexes over multiple subsequent Solr upgrades without > needing manual intervention. > It comes with the following limitations: > i) All _source_ fields need to be either stored=true or docValues=true. Any > copyField destination fields can be stored=false of course, just that the > source fields (or more precisely, the source fields you care about > preserving) should be either stored or docValues true. > ii) The datatype of an existing field in schema.xml shouldn't change upon > Solr upgrade. Introducing new fields is fine. > For indexes where this limitation is not a problem (it wasn't for us!), the > tool can reindex in-place on the same core with zero downtime and > legitimately "upgrade" the index. This can remove a lot of operational > headaches, especially in environments with hundreds/thousands of very large > indexes. -- This message was sent by Atlassian Jira (v8.20.10#820010) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@solr.apache.org For additional commands, e-mail: issues-h...@solr.apache.org