[ 
https://issues.apache.org/jira/browse/SOLR-17725?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17941691#comment-17941691
 ] 

Rahul Goswami commented on SOLR-17725:
--------------------------------------

[~ab] For those running SolrCloud AND having enough capacity in terms of 
infrastructure and budget, the REINDEXCOLLECTION command is a good option. I 
see that it reindexes onto a parallel collection. So for clusters with 
hundreds/thousands of large indexes, that cost can be substantial. Also the 
source collection is put in read-only mode while the reindexing happens. So can 
be a point of contention in case of environments which are more update heavy 
than search heavy (for eg: for us at Commvault). 

By means of this Jira I am attempting to overcome the Lucene limitation which 
forces you to reindex from source, when you really don't HAVE to. At least I 
would like to offer that option to users who are more cost sensitive or 
operationally sensitive (eg: Solutions which package Solr as part of the 
application and are installed/deployed on customer sites. It can be awkward to 
reason with customers as to why a solution upgrade may need a downtime if it 
involves a Solr upgrade).

The proposed solution reindexes into the same core, can be easily adapted to 
work with both standalone Solr and SolrCloud, and allows both updates and 
searches to be served while doing so. This also helps remove additional 
operational overhead since now users can focus on just the Solr upgrade without 
having to worry about index compatibility.   

 

> Automatically upgrade Solr indexes without needing to reindex from source
> -------------------------------------------------------------------------
>
>                 Key: SOLR-17725
>                 URL: https://issues.apache.org/jira/browse/SOLR-17725
>             Project: Solr
>          Issue Type: Improvement
>            Reporter: Rahul Goswami
>            Priority: Major
>         Attachments: High Level Design.png
>
>
> Today upgrading from Solr version X to X+2 requires complete reingestion of 
> data from source. This comes from Lucene's constraint which only guarantees 
> index compatibility between the version the index was created in and the 
> immediate next version. 
> This reindexing usually comes with added downtime and/or cost. Especially in 
> case of deployments which are in customer environments and not completely in 
> control of the vendor, this proposition of having to completely reindex the 
> data can become a hard sell.
> I, on behalf of my employer, Commvault, have developed a way which achieves 
> this reindexing in-place on the same index. Also, the process automatically 
> keeps "upgrading" the indexes over multiple subsequent Solr upgrades without 
> needing manual intervention. 
> It comes with the following limitations:
> i) All _source_ fields need to be either stored=true or docValues=true. Any 
> copyField destination fields can be stored=false of course, just that the 
> source fields (or more precisely, the source fields you care about 
> preserving) should be either stored or docValues true. 
> ii) The datatype of an existing field in schema.xml shouldn't change upon 
> Solr upgrade. Introducing new fields is fine. 
> For indexes where this limitation is not a problem (it wasn't for us!), the 
> tool can reindex in-place on the same core with zero downtime and 
> legitimately "upgrade" the index. This can remove a lot of operational 
> headaches, especially in environments with hundreds/thousands of very large 
> indexes.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@solr.apache.org
For additional commands, e-mail: issues-h...@solr.apache.org

Reply via email to