Re: Re-index after upgrade

Christopher Schultz Mon, 13 Jun 2022 12:19:36 -0700

Shawn,

On 6/13/22 14:40, Shawn Heisey wrote:

On 6/13/2022 10:14 AM, Christopher Schultz wrote:
1. Re: regular re-indexes. I've just built this into my webapplication so it's literally a one-click administrativebackground-process kick-off. I've been trying to get automaticschema-provisioning as well (see my recent posts to users@) just incase the index doesn't even exist at first. The idea is to make newapplication installations / DR a simpler and more automated process.
The best option is to entirely eradicate the existing index beforerebuilding it. One way to do this is to completely delete the indexdirectory and then reload the core or restart Solr. Another way is todelete all documents and then optimize the index. Lucene will see thatnone of the segments contain non-deleted documents and will completelydelete them all. It should be effectively equivalent to deleting theindex directory and reloading. This is what my rebuild script for mycurrent Solr install does. A full reindex only takes about ten minutes,though.

In my testing for (automatically) creating cores from scratch, I foundthat the schema for the core seems to survive that process. Running"solr -d corename" will delete the core and the on-disk directory. Butre-creating the core somehow resurrects the old schema. I can ask aboutthat under separate cover, but is that going to complicate this process?

Another option would be to create a core with a new name and then "swapcores" which is a process I know exists merely because there is a buttonfor it in the admin web UI.

2. "Index upgrader tool" -- I have no idea what this is. Do I need tocare? Or are you saying that if I upgrade from 7.x -> 9.x I won't evenbe able to write to the same on-disk index artifacts at all, unless Icreate a new core?
IndexUpgrader is something provided by Lucene. All it does is aforceMerge down to one segment -- equivalent to "optimize" in Solr. Thisupgrades the index to the current Lucene version as fully as ispossible, but the version from the old segments is preserved eventhrough the merge.
That version preservation is why if you try upgrading from 7.x to 9.x,even if you take an intermediate step of running IndexUpgrader from 8.x,you won't even be able to READ the index, much less write to it. Lucenewill refuse to open it.


Okay. So if I do what I initially proposed:

1. delete *:*
2. re-index everything

But otherwise leave the core alone... will I have successfully"re-built" the index such that I don't have that old-version lurkingaround waiting to bite me in future upgrades?

Is step 1 even necessary? If I refresh every single document in theindex, would it ultimately purge old segments by (a) marking alldocuments in the old segments as "deleted" and (b) creating only newsegments to contain the new documents? I will be "replacing" eachdocument with an updated one: their ids will remain stable frompre-re-index to post-re-index.

4. Re: Complete re-build of infrastructure + cut-over: we abuse Solr alittle and use it as an online system and not just a static "productcatalog" or whatever. We actually use it to store application userinformation so we can perform quick user-searches. We have severalapplications all connecting to the same index and contributing updatesand performing queries, so a clean switchover is difficult to do (wearen't using an intermediate proxy). I suppose introducing a proxywouldn't be the worst possible idea.
The way I managed this was a little involved.
I had two complete online copies of the index, three if you count thedev server. Each copy was independently updated, I did not usereplication. I used haproxy and pacemaker to float a virtual IP addressbetween some of the servers and automatically switch to another copy ofthe index if the main copy went down.
Each copy of the index had two cores for each shard -- a live core and abuild core. A full rebuild would build into the build cores (wipingthem as mentioned above before beginning any indexing), and then oncethe rebuild was completely done, swap the live cores with the build cores.
In cloud mode, you cannot follow that paradigm precisely. Instead, youwould just create a new collection for a rebuild, and once it was ready,update a collection alias to point to the new collection.

Okay. I have one-and-only-one Solr node at this point (which issufficient for my current needs) so it's a little simpler than yourdeployment described above. The one monkey wrench is that the "onlinecore" could theoretically get updates while the "build core" is beingre-generated from scratch. That won't be a problem if the re-indexoperation hasn't yet gotten to the user who was updated during thatinterval, but if a user gets updated who was already re-indexed, thenthere could be a problem.

As it stands right this moment, the re-index and operational changes arebeing made in realtime on the same exact core, so assuming there isn't adisaster, the index will always be consistent and up-to-date and I don'thave to do any post-re-index re-re-index of anything that may have beenleft behind or missed during the process.

Given the simple (possibly bordering on naive) process above, what stepswould I need to take to ensure that the resulting core state is usableby Solr N+1, etc. in the future?


Thanks,
-chris

Re: Re-index after upgrade

Reply via email to