Re: Re-index after upgrade

Gus Heck Sun, 12 Jun 2022 14:58:07 -0700

What Thomas said, if possible...

Definitely set up a test system if you have the resources. Building a new
index from scratch ensures that nothing is lurking unconverted and
allows you to move to a newer index format. One specific cost of
re-indexing into the old index is that the index upgrader tool will
continue to refuse to operate on it. That tool is meant for temporary
transitions only, and won't touch an index that has *ever* been written by
a version more than 1 version older. I'd only re-use the same index if I
was trapped into it by lack of hardware resources or the enormity of the
reindex time (these things do happen, but they make life harder and add
risk).

Indexing directly into the old index without clearing it means you can't
accept any field changes, and depending on how old things are that may
require care to avoid, and will also imply writing the index in old formats
so that new documents in segments being merged with segments containing old
documents while you index are compatible. This sort of thing is typically
fine for short distance upgrades (within a major version) but for leaping
major versions it gets increasingly risky as the version differential
increases. This is driving straight into corner-case territory where bugs
may be found. If you have to, you have to (backup your index and *verify*
the backup!) but if you don't have to, don't do it.

If you do find a reason to delete *:* (meaning you've intentionally
accepted down time for the duration of your reindex!, but have neither the
space nor the hardware to create a second index let alone a second system)
you may want to forceMerge (optimize) after to ensure old segments are
cleared out ahead of indexing. Do the delete and the merge on the old
version, just to avoid any possible oddities with new code handling
ancient data (unlikely, but...). This should be fast since segments that
have 100% deleted docs don't actually need to be merged and the code will
shortcut it. I've seen sub-second forceMerge in such cases (specifically on
8.6 with a previously merged single segment index where I indexed some
docs, deleted all the new docs, and forceMerged to reset to original
conditions...  Of course, on much older versions YMMV, as always test on a
test index first).

In an ideal world, in an environment like AWS where machines are easily
exchanged, one would simply recreate the entire system from scratch, test
the new creation and then cut over and then sunset the old instances. With
an infrastructure as a service setting there's only a brief cost increase
which is usually a big win vs the risk. Especially if you hold the old
instances for a little while you have a very easy rollback in case of
disaster. If it's not mission critical and down time or performance
degradation is perfectly ok you could use the same machine of course. If
you are not on aws/azure/gcloud etc and using on prem physical hardware,
then of course you should balance cost and consider your capacities,
including your machine upgrade policies to see what makes the most sense
for you. One thing to be sure not to do is run your production server low
on disk space while building a new index. Also high indexing rates will
impact performance on the machine so you may need to throttle the rate at
which you send your updates if you really must work with only one machine.

Finally you should consider how to make upgrading and re-indexing a more
regular activity and ensure it's a well known, smooth process with
acceptable (budgeted) costs. Regular upgrades are desirable for security
reasons even if you don't have feature driven reasons.

Best,
Gus

On Sun, Jun 12, 2022 at 3:10 PM Thomas Corthals <[email protected]>
wrote:

> Or if you have the resources, set up a separate machine for the new Solr
> version and reindex and test against that one before switching.
>
> Op zo 12 jun. 2022 20:21 schreef Dave <[email protected]>:
>
> > You don’t need a new core/collection, just reindex everything again.
> > Ideally since you’re using standalone (way better than cloud imo) you can
> > use the same indexer, just do an integrity check after the fact to make
> > sure the document counts are the same. You don’t really need to do that
> > delete if you are just going to obliterate the previous install and index
> >
> > > On Jun 12, 2022, at 1:49 PM, Christopher Schultz <
> > [email protected]> wrote:
> > >
> > > All,
> > >
> > > We've been using the same major version of Solr for years so haven't
> had
> > to do this yet, but we are preparing to upgrade between major versions,
> now.
> > >
> > > After upgrading, I'm assuming that the existing index is "usable" but
> > I've read many times that "you should reindex after a major version
> change."
> > >
> > > Okay.
> > >
> > > Does that just mean:
> > >
> > > 1. delete *:*
> > > 2. re-add all documents
> > >
> > > ?
> > >
> > > Or do we have to create a new core/collection with the schema from
> > scratch and load it?
> > >
> > > I'm using standalone Solr (i.e. no ZK) with a single core if that makes
> > any difference.
> > >
> > > Thanks,
> > > -chris
> >
>

-- 
http://www.needhamsoftware.com (work)
http://www.the111shift.com (play)

Re: Re-index after upgrade

Reply via email to