[ 
https://issues.apache.org/jira/browse/SOLR-17725?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Rahul Goswami updated SOLR-17725:
---------------------------------
    Description: 
Today upgrading from Solr version X to X+2 requires complete reingestion of 
data from source. This comes from Lucene's constraint which only guarantees 
index compatibility between the version the index was created in and the 
immediate next version. 

This reindexing usually comes with added downtime and/or cost. Especially in 
case of deployments which are in customer environments and not completely in 
control of the vendor, this proposition of having to completely reindex the 
data can become a hard sell.

I, on behalf of my employer, Commvault, have developed a way which achieves 
this reindexing in-place on the same index. Also, the process automatically 
keeps "upgrading" the indexes over multiple subsequent Solr upgrades without 
needing manual intervention. 

It comes with the following limitations:
i) All _source_ fields need to be either stored=true or docValues=true. Any 
copyField destination fields can be stored=false of course, just that the 
source fields (or more precisely, the source fields you care about preserving) 
should be either stored or docValues true. 
ii) The datatype of an existing field in schema.xml shouldn't change upon Solr 
upgrade. Introducing new fields is fine. 

For indexes where this limitation is not a problem (it wasn't for us!), the 
tool can reindex in-place on the same core with zero downtime and legitimately 
"upgrade" the index. This can remove a lot of operational headaches, especially 
in environments with hundreds/thousands of very large indexes.

  was:
Today upgrading from Solr version X to X+2 requires complete reingestion of 
data from source. This comes from Lucene's constraint which only guarantees 
index compatibility between the version the index was created in and the 
immediate next version. 

This reindexing usually comes with added downtime and/or cost. Especially in 
case of deployments which are in customer environments and not completely in 
control of the vendor, this proposition of having to completely reindex the 
data can become a hard sell.

We at Commvault have developed a way which achieves this reindexing in-place on 
the same index. Also, the process automatically keeps "upgrading" the indexes 
over multiple subsequent Solr upgrades without needing manual intervention. 

It comes with the following limitations:
i) All _source_ fields need to be either stored=true or docValues=true. Any 
copyField destination fields can be stored=false of course, just that the 
source fields (or more precisely, the source fields you care about preserving) 
should be either stored or docValues true. 
ii) The datatype of an existing field in schema.xml shouldn't change upon Solr 
upgrade. Introducing new fields is fine. 

For indexes where this limitation is not a problem (it wasn't for us!), the 
tool can reindex in-place on the same core with zero downtime and legitimately 
"upgrade" the index. This can remove a lot of operational headaches, especially 
in environments with hundreds/thousands of very large indexes.


> Automatically upgrade Solr indexes without needing to reindex from source
> -------------------------------------------------------------------------
>
>                 Key: SOLR-17725
>                 URL: https://issues.apache.org/jira/browse/SOLR-17725
>             Project: Solr
>          Issue Type: Improvement
>            Reporter: Rahul Goswami
>            Priority: Major
>
> Today upgrading from Solr version X to X+2 requires complete reingestion of 
> data from source. This comes from Lucene's constraint which only guarantees 
> index compatibility between the version the index was created in and the 
> immediate next version. 
> This reindexing usually comes with added downtime and/or cost. Especially in 
> case of deployments which are in customer environments and not completely in 
> control of the vendor, this proposition of having to completely reindex the 
> data can become a hard sell.
> I, on behalf of my employer, Commvault, have developed a way which achieves 
> this reindexing in-place on the same index. Also, the process automatically 
> keeps "upgrading" the indexes over multiple subsequent Solr upgrades without 
> needing manual intervention. 
> It comes with the following limitations:
> i) All _source_ fields need to be either stored=true or docValues=true. Any 
> copyField destination fields can be stored=false of course, just that the 
> source fields (or more precisely, the source fields you care about 
> preserving) should be either stored or docValues true. 
> ii) The datatype of an existing field in schema.xml shouldn't change upon 
> Solr upgrade. Introducing new fields is fine. 
> For indexes where this limitation is not a problem (it wasn't for us!), the 
> tool can reindex in-place on the same core with zero downtime and 
> legitimately "upgrade" the index. This can remove a lot of operational 
> headaches, especially in environments with hundreds/thousands of very large 
> indexes.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@solr.apache.org
For additional commands, e-mail: issues-h...@solr.apache.org

Reply via email to