[jira] [Comment Edited] (SOLR-17725) Automatically upgrade Solr indexes without needing to reindex from source

Jira Thu, 03 Apr 2025 04:49:08 -0700


    [ 
https://issues.apache.org/jira/browse/SOLR-17725?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17940666#comment-17940666
 ]


Jan Høydahl edited comment on SOLR-17725 at 4/3/25 11:48 AM:
-------------------------------------------------------------

Please clarify your intent with this Jira before continuting with any code 
contributions. While I think such a feature would benefit many Solr users, it 
would be sad to spend lots of time on a particular direction / implementation 
before higher level questions / designs are clarified. As such, you did the 
correct ting starting a mailing list thread and a JIRA.

My initial questions:
 * Do you intend for this to be a new Solr API, if so what is the proposed API? 
or a CLI utility tool to run on a cold index folder?
 * Is one of your design goals to avoid the need for 2-3x disk space during the 
reindex, since you work on segment level and do merges?
 * Requring Lucene API change is a potential blocker, I'd not be surprised if 
the Lucene project rejects making the "created-version" property writable, so 
such a discussion with them would come early
 * Obviously a new Solr API needs to play well with SolrCloud as well as other 
features such such as shard split / move etc. Have you thought about locking / 
conflicts?
 * A reindex-collection API is probably wanted, however it could be acceptable 
to implement a "core-level" API first and later add a "collection-level" API on 
top of it
 * Challenge the assumption that "in-place" segment level is the best choice 
for this feature. Re-indexing into a new collection due to major schema changes 
is also a common use case that this will not address


was (Author: janhoy):
Please clarify your intent with this Jira before continuting with any code 
contributions. While I think such a feature would benefit many Solr users, it 
would be sad to spend lots of time on a particular direction / implementation 
before higher level questions / designs are clarified. As such, you did the 
correct ting starting a mailing list thread and a JIRA.

My initial questions:
 * Do you intend for this to be a new Solr API, if so what is the proposed API? 
or a CLI utility tool to run on a cold index folder?
 * Is one of your design goals to avoid the need for 2-3x disk space during the 
reindex, since you work on segment level and do merges
 * Requring Lucene API change is a potential blocker, I'd not be surprised if 
the Lucene project rejects making the "created-version" property writable, so 
such a discussion with them would come early
 * Obviously a new Solr API needs to play well with SolrCloud as well as other 
features such such as shard split / move etc. It could however be acceptable to 
implement a "core-level" API first and later a "cluser-level" on top of it
 * Challenge the assumption that "in-place" segment level is the best choice 
for this feature. Re-index into a new collection due to major schema changes is 
also a common use case that this will not address

> Automatically upgrade Solr indexes without needing to reindex from source
> -------------------------------------------------------------------------
>
>                 Key: SOLR-17725
>                 URL: https://issues.apache.org/jira/browse/SOLR-17725
>             Project: Solr
>          Issue Type: Improvement
>            Reporter: Rahul Goswami
>            Priority: Major
>         Attachments: High Level Design.png
>
>
> Today upgrading from Solr version X to X+2 requires complete reingestion of 
> data from source. This comes from Lucene's constraint which only guarantees 
> index compatibility between the version the index was created in and the 
> immediate next version. 
> This reindexing usually comes with added downtime and/or cost. Especially in 
> case of deployments which are in customer environments and not completely in 
> control of the vendor, this proposition of having to completely reindex the 
> data can become a hard sell.
> I, on behalf of my employer, Commvault, have developed a way which achieves 
> this reindexing in-place on the same index. Also, the process automatically 
> keeps "upgrading" the indexes over multiple subsequent Solr upgrades without 
> needing manual intervention. 
> It comes with the following limitations:
> i) All _source_ fields need to be either stored=true or docValues=true. Any 
> copyField destination fields can be stored=false of course, just that the 
> source fields (or more precisely, the source fields you care about 
> preserving) should be either stored or docValues true. 
> ii) The datatype of an existing field in schema.xml shouldn't change upon 
> Solr upgrade. Introducing new fields is fine. 
> For indexes where this limitation is not a problem (it wasn't for us!), the 
> tool can reindex in-place on the same core with zero downtime and 
> legitimately "upgrade" the index. This can remove a lot of operational 
> headaches, especially in environments with hundreds/thousands of very large 
> indexes.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[jira] [Comment Edited] (SOLR-17725) Automatically upgrade Solr indexes without needing to reindex from source

Reply via email to