This being an auxiliary process, for performance reasons, there is no
parallelism in the current version so as to take least resources away from
application processes. We upgrade core-by-core. The process reads a core
segment-by-segment and reindexes onto the same core. So the older version
of the updated docs anyway get marked as deleted thanks to Lucene. This
helps provide continuity across restarts. There is some decoration wrt
maintaining the status of the process like which core is being reindexed,
etc so that the same core is picked up again for upgrading upon restart.

We maintain a cluster of standalone Solr nodes and don't use SolrCloud. But
the process operates at a Solr core level and still applies. I anticipate
modifications during the PR review process anyway, so we can evolve the
feature to address any missing cases. But the overall solution has been
tested to work (for us).

I have created  the below JIRA with more details and can start with the
pull requests soon :)
https://issues.apache.org/jira/browse/SOLR-17725

Also please let me know if we should take this discussion to the Solr dev
list. Although I have past contributions, I don't yet have experience
coordinating over a major feature like this, so could use some help there
as well :)

Thanks,
Rahul

On Tue, Apr 1, 2025 at 8:19 PM Luke Kot-Zaniewski (BLOOMBERG/ 919 3RD A) <
lkotzanie...@bloomberg.net> wrote:

> I'm also curious if this solution relies on a solr cloud on version n-1
> sending a distributed update request to a solr cloud on version n (or
> potnetially vice versa) and how stable this is .. :-)
>
> Sent from Bloomberg Professional for Android
>
> ----- Original Message -----
> From: Gus Heck <users@solr.apache.org>
> To: users@solr.apache.org
> At: 04/01/25 19:55:34 UTC-04:00
>
>
> That's interesting, but brings up the question of what happens if a node
> (or the whole cluster) is rebooted in the middle of the process?
>
> On Mon, Mar 31, 2025 at 10:02 PM Rahul Goswami <rahul196...@gmail.com>
> wrote:
>
> > Some good points brought up in the discussion. The implementation we have
> > reindexes a shard reading all documents onto itself, but takes care of
> the
> > fact that no older version segment merges with a fresh segment.
> > This happens with zero downtime and without requiring a large storage
> > buffer. By the end of the process, you have an index which Solr
> identifies
> > as being "created in the newer version".
> >
> > We have tested it on 5+ TB indexes and are happy with the results. Some
> > performance hit for application performance is expected, but for us it is
> > within acceptable limits. With more inputs from the community, I am sure
> we
> > can polish it further.
> > The goal is to have at least something which will work for a significant
> > user base, or at least have an option available to decide based on
> > individual use-cases.
> >
> > I am working on the design doc to get the discussion started and will
> share
> > the JIRA by tomorrow night.
> >
> > -Rahul
> >
> > On Mon, Mar 31, 2025 at 1:18 PM Luke Kot-Zaniewski (BLOOMBERG/ 919 3RD
> A) <
> > lkotzanie...@bloomberg.net> wrote:
> >
> > > >> the only thing that makes sense is reindexing from source to a new
> > > >> cluster that will replace the old cluster
> > >
> > > Ideally yes, but there is a social aspect when Solr is managed as a
> > > service and the many sources are opaque to the team managing it.
> > > Let's assume for the sake of argument that the below is true or
> > > achievable:
> > >
> > > >> solution has enough extra capacity
> > >
> > > I am interested in this:
> > >
> > > >> Another case that might make such a thing interesting would be *if*
> it
> > > was
> > > >> designed to co-locate shards/replicas being reindexed and prevented
> > the
> > > >> need for over the wire transport (caveats about hashing/routing
> > changes,
> > > >> etc). That could speed things up significantly, and a process might
> > look
> > > >> like
> > >
> > > Let's assume the shard routing is invariant across versions, if you
> > > were able to create these upgraded local replica from their respective
> > > lower-version source replica, how easily could you stitch these
> together
> > > again into a Solr Cloud? If you were cloning from a collection that was
> > > receiving some live traffic it might be hard because I imagine you'd
> > > need to know which replica of a particular shard was most up-to-date
> > > and ensure that replica became the leader in the new cloud. So would
> > > this effectively require some kind of special leader election logic
> > > or at least some knowledge of the source transaction log as well?
> > >
> > > If we assume a pause to live traffic then this becomes simpler but
> > > then you have the social aspect of coordinating with many teams again.
> > >
> > > In our case, we were considering developing a dual write system with
> > > a versionField defined to ensure consistent ordering between the two
> > > clouds, n  and n+m, and having this live outside of Solr. Then, the
> > > actual backfill could be kicked off from some snapshot taken *after*
> > > we enabled dual write. And then finally deleting the old cloud once
> > > we routed traffic to the new one (and let it "bake"). As Gus points
> > > out, at "big data" scale the backfill becomes hard and so the
> > > idea of making this less resource intensive is enticing...
> > >
> > >
> > > From: users@solr.apache.org At: 03/30/25 14:59:16 UTC-4:00To:
> > > users@solr.apache.org
> > > Subject: Re: Automatic upgrade of Solr indexes over multiple versions
> > >
> > > Some thoughts:
> > >
> > > A lot depends on the use case for this sort of thing. In the case of
> > > relatively small installs that can afford to run >2x disk and have
> > > significant query latency headroom this might be useful. However, if a
> > > company is running a large cluster where maintaining excess capacity
> > costs
> > > tens of thousands of dollars, they will often be "cutting it close" on
> > > available storage (I've seen yearly storage costs over 100k some
> places )
> > > and trying to maintain just enough excess query performance to handle
> > > "normal" spikes in traffic. Adding the load/disk demands of a re-index
> > > within the same cluster (making both query and indexing slower) is
> > usually
> > > a bad idea. Even if you index from the index into a new separate
> cluster,
> > > that query load to pull the data from the index may place you above
> > > acceptable risk thresholds. For large clusters the only thing that
> makes
> > > sense is reindexing from source to a new cluster that will replace the
> > old
> > > cluster, because in that way you can (usually) pull the data much
> faster
> > > without impacting the users. (Notable exceptions crop up in cases where
> > the
> > > original source is a live database also used by the users, then some
> care
> > > with the query rate is needed again)
> > >
> > > I suppose another use case could be if the cluster is being run on bare
> > > metal rather than a service like AWS or a much larger Virtualization
> > > environment. In the bare metal case spinning up new machines for
> > temporary
> > > use is not an option, but again only if the bare metal solution has
> > enough
> > > extra capacity.
> > >
> > > Another case that might make such a thing interesting would be *if* it
> > was
> > > designed to co-locate shards/replicas being reindexed and prevented the
> > > need for over the wire transport (caveats about hashing/routing
> changes,
> > > etc). That could speed things up significantly, and a process might
> look
> > > like
> > >
> > >    1. Upgrade (solr will read index version -1)
> > >    2. Clone to 2x disk cluster
> > >    3. reindex into peer collection (to reset index version counter)
> > >    4. Update alias, delete original collection
> > >    5. Clone to 1x disk cluster
> > >    6. Swap and sunset original upgraded cluster.
> > >
> > > If folks have engineered an easy/efficient backup/clone cluster for
> > steps 2
> > > and 5, step 3 could be faster than reindex from originals, reducing
> > > parallel run time (which could save money in large installs)
> > >
> > > Clear documentation of limitations and expected load profiles,
> throttling
> > > etc would be important in any case. It's important to consider the "Big
> > > Data" case because if you are lucky, "Small Data" grows into "Big
> Data."
> > > However the transition can be subtle and badly trap people if the
> > > transition is not anticipated and well thought out.
> > >
> > > On Sun, Mar 30, 2025 at 9:21 AM ufuk yılmaz
> <uyil...@vivaldi.net.invalid
> > >
> > > wrote:
> > >
> > > > I’m guessing this is not simply retrieving all documents through API
> > > using
> > > > pagination and sending them to index 🤔 About being in-place, how can
> > it
> > > > work when a new Solr version requires a different schema or config
> > file,
> > > > because time to time old definitions don’t work in a new version.
> > > >
> > > > -ufuk
> > > >
> > > > —
> > > >
> > > > > On Mar 30, 2025, at 10:33, Luke Kot-Zaniewski (BLOOMBERG/ 919 3RD
> A)
> > <
> > > > lkotzanie...@bloomberg.net> wrote:
> > > > >
> > > > > Hi Rahul,
> > > > >
> > > > > This sounds very interesting!
> > > > >
> > > > > I enjoyed the discussion at CoC and would be very
> > > > > interested to hear more about the technical details.
> > > > >
> > > > > I am also curious to know more what you mean by "in-place"
> > > > > and what the expectation is around downtime.
> > > > >
> > > > > Either way I am sure this would be a great addition to
> > > > > the tool belt for getting people to finally move off
> > > > > ancient versions of Solr.
> > > > >
> > > > > Look forward to discussing this more on the JIRA!
> > > > >
> > > > > Luke
> > > > >
> > > > > From: users@solr.apache.org At: 03/28/25 01:05:57 UTC-4:00To:
> > > > users@solr.apache.org
> > > > > Subject: Automatic upgrade of Solr indexes over multiple versions
> > > > >
> > > > > Today upgrading from Solr version X to X+2 requires complete
> > > reingestion
> > > > of
> > > > > data from source. This comes from Lucene's constraint which only
> > > > guarantees
> > > > > index compatibility between the version the index was created in
> and
> > > the
> > > > > immediate next version.
> > > > >
> > > > >
> > > > > This reindexing usually comes with added downtime and/or cost.
> > > Especially
> > > > > in case of deployments which are in customer environments and not
> > > > > completely in control of the vendor, this proposition of having to
> > > > > completely reindex the data can become a hard sell.
> > > > >
> > > > >
> > > > > I have developed a way which achieves this reindexing in-place on
> the
> > > > same
> > > > > index. Also, the process automatically keeps "upgrading" the
> indexes
> > > over
> > > > > multiple subsequent Solr upgrades without needing manual
> > intervention.
> > > > >
> > > > >
> > > > > It does come with a limitation that all *source* fields need to be
> > > either
> > > > > stored=true or docValues=true. Any copyField destination fields can
> > be
> > > > > stored=false of course, but as long as the source field (or in
> > general,
> > > > the
> > > > > fields you care about preserving) is either stored or docValues
> true
> > ,
> > > > the
> > > > > tool can reindex in-place and legitimately "upgrade" the index. For
> > > > indexes
> > > > > where this limitation is not a problem (it wasn't for us!), this
> tool
> > > can
> > > > > remove a lot of operational headaches, especially in environments
> > with
> > > > > hundreds/thousands of very large indexes.
> > > > >
> > > > >
> > > > > I had a conversation about this with some of you during "Apache
> > > Community
> > > > > over Code 2024" in Denver, and I could sense some interest. If this
> > > > feature
> > > > > sounds appealing, I would like to contribute it to Solr on behalf
> of
> > my
> > > > > employer, Commvault. Please let me know if I should create a JIRA
> and
> > > get
> > > > > the discussion rolling!
> > > > >
> > > > >
> > > > > Thanks,
> > > > > Rahul Goswami
> > > > >
> > > > >
> > > >
> > > >
> > >
> > > --
> > > http://www.needhamsoftware.com (work)
> > > https://a.co/d/b2sZLD9 (my fantasy fiction book)
> > >
> > >
> > >
> >
>
>
> --
> http://www.needhamsoftware.com (work)
> https://a.co/d/b2sZLD9 (my fantasy fiction book)
>

Reply via email to