That's interesting, but brings up the question of what happens if a node
(or the whole cluster) is rebooted in the middle of the process?

On Mon, Mar 31, 2025 at 10:02 PM Rahul Goswami <rahul196...@gmail.com>
wrote:

> Some good points brought up in the discussion. The implementation we have
> reindexes a shard reading all documents onto itself, but takes care of the
> fact that no older version segment merges with a fresh segment.
> This happens with zero downtime and without requiring a large storage
> buffer. By the end of the process, you have an index which Solr identifies
> as being "created in the newer version".
>
> We have tested it on 5+ TB indexes and are happy with the results. Some
> performance hit for application performance is expected, but for us it is
> within acceptable limits. With more inputs from the community, I am sure we
> can polish it further.
> The goal is to have at least something which will work for a significant
> user base, or at least have an option available to decide based on
> individual use-cases.
>
> I am working on the design doc to get the discussion started and will share
> the JIRA by tomorrow night.
>
> -Rahul
>
> On Mon, Mar 31, 2025 at 1:18 PM Luke Kot-Zaniewski (BLOOMBERG/ 919 3RD A) <
> lkotzanie...@bloomberg.net> wrote:
>
> > >> the only thing that makes sense is reindexing from source to a new
> > >> cluster that will replace the old cluster
> >
> > Ideally yes, but there is a social aspect when Solr is managed as a
> > service and the many sources are opaque to the team managing it.
> > Let's assume for the sake of argument that the below is true or
> > achievable:
> >
> > >> solution has enough extra capacity
> >
> > I am interested in this:
> >
> > >> Another case that might make such a thing interesting would be *if* it
> > was
> > >> designed to co-locate shards/replicas being reindexed and prevented
> the
> > >> need for over the wire transport (caveats about hashing/routing
> changes,
> > >> etc). That could speed things up significantly, and a process might
> look
> > >> like
> >
> > Let's assume the shard routing is invariant across versions, if you
> > were able to create these upgraded local replica from their respective
> > lower-version source replica, how easily could you stitch these together
> > again into a Solr Cloud? If you were cloning from a collection that was
> > receiving some live traffic it might be hard because I imagine you'd
> > need to know which replica of a particular shard was most up-to-date
> > and ensure that replica became the leader in the new cloud. So would
> > this effectively require some kind of special leader election logic
> > or at least some knowledge of the source transaction log as well?
> >
> > If we assume a pause to live traffic then this becomes simpler but
> > then you have the social aspect of coordinating with many teams again.
> >
> > In our case, we were considering developing a dual write system with
> > a versionField defined to ensure consistent ordering between the two
> > clouds, n  and n+m, and having this live outside of Solr. Then, the
> > actual backfill could be kicked off from some snapshot taken *after*
> > we enabled dual write. And then finally deleting the old cloud once
> > we routed traffic to the new one (and let it "bake"). As Gus points
> > out, at "big data" scale the backfill becomes hard and so the
> > idea of making this less resource intensive is enticing...
> >
> >
> > From: users@solr.apache.org At: 03/30/25 14:59:16 UTC-4:00To:
> > users@solr.apache.org
> > Subject: Re: Automatic upgrade of Solr indexes over multiple versions
> >
> > Some thoughts:
> >
> > A lot depends on the use case for this sort of thing. In the case of
> > relatively small installs that can afford to run >2x disk and have
> > significant query latency headroom this might be useful. However, if a
> > company is running a large cluster where maintaining excess capacity
> costs
> > tens of thousands of dollars, they will often be "cutting it close" on
> > available storage (I've seen yearly storage costs over 100k some places )
> > and trying to maintain just enough excess query performance to handle
> > "normal" spikes in traffic. Adding the load/disk demands of a re-index
> > within the same cluster (making both query and indexing slower) is
> usually
> > a bad idea. Even if you index from the index into a new separate cluster,
> > that query load to pull the data from the index may place you above
> > acceptable risk thresholds. For large clusters the only thing that makes
> > sense is reindexing from source to a new cluster that will replace the
> old
> > cluster, because in that way you can (usually) pull the data much faster
> > without impacting the users. (Notable exceptions crop up in cases where
> the
> > original source is a live database also used by the users, then some care
> > with the query rate is needed again)
> >
> > I suppose another use case could be if the cluster is being run on bare
> > metal rather than a service like AWS or a much larger Virtualization
> > environment. In the bare metal case spinning up new machines for
> temporary
> > use is not an option, but again only if the bare metal solution has
> enough
> > extra capacity.
> >
> > Another case that might make such a thing interesting would be *if* it
> was
> > designed to co-locate shards/replicas being reindexed and prevented the
> > need for over the wire transport (caveats about hashing/routing changes,
> > etc). That could speed things up significantly, and a process might look
> > like
> >
> >    1. Upgrade (solr will read index version -1)
> >    2. Clone to 2x disk cluster
> >    3. reindex into peer collection (to reset index version counter)
> >    4. Update alias, delete original collection
> >    5. Clone to 1x disk cluster
> >    6. Swap and sunset original upgraded cluster.
> >
> > If folks have engineered an easy/efficient backup/clone cluster for
> steps 2
> > and 5, step 3 could be faster than reindex from originals, reducing
> > parallel run time (which could save money in large installs)
> >
> > Clear documentation of limitations and expected load profiles, throttling
> > etc would be important in any case. It's important to consider the "Big
> > Data" case because if you are lucky, "Small Data" grows into "Big Data."
> > However the transition can be subtle and badly trap people if the
> > transition is not anticipated and well thought out.
> >
> > On Sun, Mar 30, 2025 at 9:21 AM ufuk yılmaz <uyil...@vivaldi.net.invalid
> >
> > wrote:
> >
> > > I’m guessing this is not simply retrieving all documents through API
> > using
> > > pagination and sending them to index 🤔 About being in-place, how can
> it
> > > work when a new Solr version requires a different schema or config
> file,
> > > because time to time old definitions don’t work in a new version.
> > >
> > > -ufuk
> > >
> > > —
> > >
> > > > On Mar 30, 2025, at 10:33, Luke Kot-Zaniewski (BLOOMBERG/ 919 3RD A)
> <
> > > lkotzanie...@bloomberg.net> wrote:
> > > >
> > > > Hi Rahul,
> > > >
> > > > This sounds very interesting!
> > > >
> > > > I enjoyed the discussion at CoC and would be very
> > > > interested to hear more about the technical details.
> > > >
> > > > I am also curious to know more what you mean by "in-place"
> > > > and what the expectation is around downtime.
> > > >
> > > > Either way I am sure this would be a great addition to
> > > > the tool belt for getting people to finally move off
> > > > ancient versions of Solr.
> > > >
> > > > Look forward to discussing this more on the JIRA!
> > > >
> > > > Luke
> > > >
> > > > From: users@solr.apache.org At: 03/28/25 01:05:57 UTC-4:00To:
> > > users@solr.apache.org
> > > > Subject: Automatic upgrade of Solr indexes over multiple versions
> > > >
> > > > Today upgrading from Solr version X to X+2 requires complete
> > reingestion
> > > of
> > > > data from source. This comes from Lucene's constraint which only
> > > guarantees
> > > > index compatibility between the version the index was created in and
> > the
> > > > immediate next version.
> > > >
> > > >
> > > > This reindexing usually comes with added downtime and/or cost.
> > Especially
> > > > in case of deployments which are in customer environments and not
> > > > completely in control of the vendor, this proposition of having to
> > > > completely reindex the data can become a hard sell.
> > > >
> > > >
> > > > I have developed a way which achieves this reindexing in-place on the
> > > same
> > > > index. Also, the process automatically keeps "upgrading" the indexes
> > over
> > > > multiple subsequent Solr upgrades without needing manual
> intervention.
> > > >
> > > >
> > > > It does come with a limitation that all *source* fields need to be
> > either
> > > > stored=true or docValues=true. Any copyField destination fields can
> be
> > > > stored=false of course, but as long as the source field (or in
> general,
> > > the
> > > > fields you care about preserving) is either stored or docValues true
> ,
> > > the
> > > > tool can reindex in-place and legitimately "upgrade" the index. For
> > > indexes
> > > > where this limitation is not a problem (it wasn't for us!), this tool
> > can
> > > > remove a lot of operational headaches, especially in environments
> with
> > > > hundreds/thousands of very large indexes.
> > > >
> > > >
> > > > I had a conversation about this with some of you during "Apache
> > Community
> > > > over Code 2024" in Denver, and I could sense some interest. If this
> > > feature
> > > > sounds appealing, I would like to contribute it to Solr on behalf of
> my
> > > > employer, Commvault. Please let me know if I should create a JIRA and
> > get
> > > > the discussion rolling!
> > > >
> > > >
> > > > Thanks,
> > > > Rahul Goswami
> > > >
> > > >
> > >
> > >
> >
> > --
> > http://www.needhamsoftware.com (work)
> > https://a.co/d/b2sZLD9 (my fantasy fiction book)
> >
> >
> >
>


-- 
http://www.needhamsoftware.com (work)
https://a.co/d/b2sZLD9 (my fantasy fiction book)

Reply via email to