Re: Re-index after upgrade

Christopher Schultz Mon, 13 Jun 2022 09:15:04 -0700

Gus,

On 6/12/22 17:57, Gus Heck wrote:

What Thomas said, if possible...


Definitely set up a test system if you have the resources. Building a new
index from scratch ensures that nothing is lurking unconverted and
allows you to move to a newer index format. One specific cost of
re-indexing into the old index is that the index upgrader tool will
continue to refuse to operate on it. That tool is meant for temporary
transitions only, and won't touch an index that has *ever* been written by
a version more than 1 version older. I'd only re-use the same index if I
was trapped into it by lack of hardware resources or the enormity of the
reindex time (these things do happen, but they make life harder and add
risk).

Indexing directly into the old index without clearing it means you can't
accept any field changes, and depending on how old things are that may
require care to avoid, and will also imply writing the index in old formats
so that new documents in segments being merged with segments containing old
documents while you index are compatible. This sort of thing is typically
fine for short distance upgrades (within a major version) but for leaping
major versions it gets increasingly risky as the version differential
increases. This is driving straight into corner-case territory where bugs
may be found. If you have to, you have to (backup your index and *verify*
the backup!) but if you don't have to, don't do it.

If you do find a reason to delete *:* (meaning you've intentionally
accepted down time for the duration of your reindex!, but have neither the
space nor the hardware to create a second index let alone a second system)
you may want to forceMerge (optimize) after to ensure old segments are
cleared out ahead of indexing. Do the delete and the merge on the old
version, just to avoid any possible oddities with new code handling
ancient data (unlikely, but...). This should be fast since segments that
have 100% deleted docs don't actually need to be merged and the code will
shortcut it. I've seen sub-second forceMerge in such cases (specifically on
8.6 with a previously merged single segment index where I indexed some
docs, deleted all the new docs, and forceMerged to reset to original
conditions...  Of course, on much older versions YMMV, as always test on a
test index first).

In an ideal world, in an environment like AWS where machines are easily
exchanged, one would simply recreate the entire system from scratch, test
the new creation and then cut over and then sunset the old instances. With
an infrastructure as a service setting there's only a brief cost increase
which is usually a big win vs the risk. Especially if you hold the old
instances for a little while you have a very easy rollback in case of
disaster. If it's not mission critical and down time or performance
degradation is perfectly ok you could use the same machine of course. If
you are not on aws/azure/gcloud etc and using on prem physical hardware,
then of course you should balance cost and consider your capacities,
including your machine upgrade policies to see what makes the most sense
for you. One thing to be sure not to do is run your production server low
on disk space while building a new index. Also high indexing rates will
impact performance on the machine so you may need to throttle the rate at
which you send your updates if you really must work with only one machine.

Finally you should consider how to make upgrading and re-indexing a more
regular activity and ensure it's a well known, smooth process with
acceptable (budgeted) costs. Regular upgrades are desirable for security
reasons even if you don't have feature driven reasons.


Great info. A few questions/comments:

1. Re: regular re-indexes. I've just built this into my web applicationso it's literally a one-click administrative background-processkick-off. I've been trying to get automatic schema-provisioning as well(see my recent posts to users@) just in case the index doesn't evenexist at first. The idea is to make new application installations / DR asimpler and more automated process.

2. "Index upgrader tool" -- I have no idea what this is. Do I need tocare? Or are you saying that if I upgrade from 7.x -> 9.x I won't evenbe able to write to the same on-disk index artifacts at all, unless Icreate a new core?

3. Re regular upgrades: yes, we've always kept current... within themajor version. We are running 7.x currently; probably moving to 8.x orpossibly 9.x if testing doesn't show any specific issues.

4. Re: Complete re-build of infrastructure + cut-over: we abuse Solr alittle and use it as an online system and not just a static "productcatalog" or whatever. We actually use it to store application userinformation so we can perform quick user-searches. We have severalapplications all connecting to the same index and contributing updatesand performing queries, so a clean switchover is difficult to do (wearen't using an intermediate proxy). I suppose introducing a proxywouldn't be the worst possible idea.


Thanks,
-chris

On Sun, Jun 12, 2022 at 3:10 PM Thomas Corthals <tho...@klascement.net>
wrote:

Or if you have the resources, set up a separate machine for the new Solr
version and reindex and test against that one before switching.

Op zo 12 jun. 2022 20:21 schreef Dave <hastings.recurs...@gmail.com>:

You don’t need a new core/collection, just reindex everything again.
Ideally since you’re using standalone (way better than cloud imo) you can
use the same indexer, just do an integrity check after the fact to make
sure the document counts are the same. You don’t really need to do that
delete if you are just going to obliterate the previous install and index

On Jun 12, 2022, at 1:49 PM, Christopher Schultz <

ch...@christopherschultz.net> wrote:


All,

We've been using the same major version of Solr for years so haven't

had

to do this yet, but we are preparing to upgrade between major versions,

now.


After upgrading, I'm assuming that the existing index is "usable" but

I've read many times that "you should reindex after a major version

change."


Okay.

Does that just mean:

1. delete *:*
2. re-add all documents

?

Or do we have to create a new core/collection with the schema from

scratch and load it?


I'm using standalone Solr (i.e. no ZK) with a single core if that makes

any difference.


Thanks,
-chris

Re: Re-index after upgrade

Reply via email to