[ https://issues.apache.org/jira/browse/SOLR-17725?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17940666#comment-17940666 ]
Jan Høydahl edited comment on SOLR-17725 at 4/3/25 11:48 AM: ------------------------------------------------------------- Please clarify your intent with this Jira before continuting with any code contributions. While I think such a feature would benefit many Solr users, it would be sad to spend lots of time on a particular direction / implementation before higher level questions / designs are clarified. As such, you did the correct ting starting a mailing list thread and a JIRA. My initial questions: * Do you intend for this to be a new Solr API, if so what is the proposed API? or a CLI utility tool to run on a cold index folder? * Is one of your design goals to avoid the need for 2-3x disk space during the reindex, since you work on segment level and do merges? * Requring Lucene API change is a potential blocker, I'd not be surprised if the Lucene project rejects making the "created-version" property writable, so such a discussion with them would come early * Obviously a new Solr API needs to play well with SolrCloud as well as other features such such as shard split / move etc. Have you thought about locking / conflicts? * A reindex-collection API is probably wanted, however it could be acceptable to implement a "core-level" API first and later add a "collection-level" API on top of it * Challenge the assumption that "in-place" segment level is the best choice for this feature. Re-indexing into a new collection due to major schema changes is also a common use case that this will not address was (Author: janhoy): Please clarify your intent with this Jira before continuting with any code contributions. While I think such a feature would benefit many Solr users, it would be sad to spend lots of time on a particular direction / implementation before higher level questions / designs are clarified. As such, you did the correct ting starting a mailing list thread and a JIRA. My initial questions: * Do you intend for this to be a new Solr API, if so what is the proposed API? or a CLI utility tool to run on a cold index folder? * Is one of your design goals to avoid the need for 2-3x disk space during the reindex, since you work on segment level and do merges * Requring Lucene API change is a potential blocker, I'd not be surprised if the Lucene project rejects making the "created-version" property writable, so such a discussion with them would come early * Obviously a new Solr API needs to play well with SolrCloud as well as other features such such as shard split / move etc. It could however be acceptable to implement a "core-level" API first and later a "cluser-level" on top of it * Challenge the assumption that "in-place" segment level is the best choice for this feature. Re-index into a new collection due to major schema changes is also a common use case that this will not address > Automatically upgrade Solr indexes without needing to reindex from source > ------------------------------------------------------------------------- > > Key: SOLR-17725 > URL: https://issues.apache.org/jira/browse/SOLR-17725 > Project: Solr > Issue Type: Improvement > Reporter: Rahul Goswami > Priority: Major > Attachments: High Level Design.png > > > Today upgrading from Solr version X to X+2 requires complete reingestion of > data from source. This comes from Lucene's constraint which only guarantees > index compatibility between the version the index was created in and the > immediate next version. > This reindexing usually comes with added downtime and/or cost. Especially in > case of deployments which are in customer environments and not completely in > control of the vendor, this proposition of having to completely reindex the > data can become a hard sell. > I, on behalf of my employer, Commvault, have developed a way which achieves > this reindexing in-place on the same index. Also, the process automatically > keeps "upgrading" the indexes over multiple subsequent Solr upgrades without > needing manual intervention. > It comes with the following limitations: > i) All _source_ fields need to be either stored=true or docValues=true. Any > copyField destination fields can be stored=false of course, just that the > source fields (or more precisely, the source fields you care about > preserving) should be either stored or docValues true. > ii) The datatype of an existing field in schema.xml shouldn't change upon > Solr upgrade. Introducing new fields is fine. > For indexes where this limitation is not a problem (it wasn't for us!), the > tool can reindex in-place on the same core with zero downtime and > legitimately "upgrade" the index. This can remove a lot of operational > headaches, especially in environments with hundreds/thousands of very large > indexes. -- This message was sent by Atlassian Jira (v8.20.10#820010) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@solr.apache.org For additional commands, e-mail: issues-h...@solr.apache.org