Re: Recovery from disk failure

anon anon Wed, 07 May 2025 13:58:23 -0700

did you tried
solr delete -c YOUR_DUPLICATA
? I am noob and I am just asking.


Le mer. 7 mai 2025 à 09:07, Karl Stoney
<karl.sto...@autotrader.co.uk.invalid> a écrit :

> Ah no worries, thanks for the reply.
> I already have a customer operator that watches the statefulsets and does
> some admin type stuff, guess ill shim into there the recovery logic for now.
>
>
> From: Jan Høydahl <jan....@cominvent.com>
> Date: Tuesday, 6 May 2025 at 17:45
> To: users@solr.apache.org <users@solr.apache.org>
> Subject: Re: Recovery from disk failure
>
> Hi,
>
> I have seen the same happening myself, and I agree it is somewhat
> unexpected. I believe there may be some reason beind the behavior, but
> cannot think of any right now.
>
> So what would work for you right now would be to get rid of the dead
> replica in zookeeper (try DELETEREPLICA), and then do an ADDREPLICA on the
> new empty box, which will create a new core and start syncing.
> Not sure if you are able to remove the replica in that state, but give it
> a try.
> So, until we decide to build a different default behavior in the "startup
> on empty disk but zk says we should have collection-A-shard-1-replica-2"
> case, the best way would be to first move all replicas away from the node
> that will upgrade disk, and then move them back again.
>
> Jan
>
> > 6. mai 2025 kl. 13:55 skrev Karl Stoney <karl.sto...@autotrader.co.uk
> .INVALID>:
> >
> > Hi,
> > I run solr cloud on GKE; and I’m trying to move my pods to a new disk
> type.  In doing so the disk will be brand new.  I’ve landed in a position
> that I’m unsure how to recover from, where the new node is not syncing data
> from the leader.
> >
> > To explain exactly what’s happening, lets say I have two nodes:
> >
> >  *
> > solr-0
> >  *
> > solr-1
> >
> > And both are active and fully replicated.
> > I take solr-1 down, and point it at the new disk (which is empty), and
> bring it back up.
> > The server starts fine, I can access solr-1 via the UI, but it never
> recovers, in the “Cloud -> Graph” UI, I can see the shard on solr-1 is
> down.
> >
> > I can see it in the “Cloud -> Nodes” GUI as up, however its collections
> have a funny state, for example: "postcodes-006_s1r9_(down):  undefined",
> vs solr-0 which shows "postcodes-006_s1r11:  847.3Mb”.
> >
> > I was expecting the node to come up and see its disk was empty, and
> resync its data from the leader, but instead it’s just sat doing, nothing….
> >
> > The fact I’m moving to new disks is somewhat moot, more broadly this is
> showing me that if we lost data on a node for whatever reason, it doesn’t
> “fix itself” - which I always (maybe blindly) assumed it would, because
> when I bring up brand new nodes (different name) it does.
> >
> > Could anyone advise what I’ve done wrong here, and what the process
> should be to get a node to resend its data entirely?
> >
> > This is what the API shows:
> >
> >
> > shard1":{
> >
> >            "range":"80000000-7fffffff",
> >
> >            "replicas":{
> >
> >              "core_node10":{
> >
> >                "core":"postcodes-006_shard1_replica_n9",
> >
> >
> "node_name":"solr-1.search-solr-next.svc.cluster.local:80_solr",
> >
> >                "type":"NRT",
> >
> >                "state":"down",
> >
> >                "force_set_state":"false",
> >
> >                "base_url":"
> https://eur03.safelinks.protection.outlook.com/?url=http%3A%2F%2Fsolr-1.search-solr-next.svc.cluster.local%2Fsolr&data=05%7C02%7CKarl.Stoney%40autotrader.co.uk%7C320a9d3baf2d456311a208dd8cbd57f6%7C926f3743f3d24b8a816818cfcbe776fe%7C0%7C0%7C638821467062718137%7CUnknown%7CTWFpbGZsb3d8eyJFbXB0eU1hcGkiOnRydWUsIlYiOiIwLjAuMDAwMCIsIlAiOiJXaW4zMiIsIkFOIjoiTWFpbCIsIldUIjoyfQ%3D%3D%7C0%7C%7C%7C&sdata=sAPOw6r%2Bpw0GnNV00bPdMTiQMNYybKlk2rgKfTZIHaw%3D&reserved=0
> "
> >
> >              },
> >
> >              "core_node12":{
> >
> >                "core":"postcodes-006_shard1_replica_n11",
> >
> >
> "node_name":"solr-0.search-solr-next.svc.cluster.local:80_solr",
> >
> >                "type":"NRT",
> >
> >                "state":"active",
> >
> >                "leader":"true",
> >
> >                "force_set_state":"false",
> >
> >                "base_url":"
> https://eur03.safelinks.protection.outlook.com/?url=http%3A%2F%2Fsolr-0.search-solr-next.svc.cluster.local%2Fsolr&data=05%7C02%7CKarl.Stoney%40autotrader.co.uk%7C320a9d3baf2d456311a208dd8cbd57f6%7C926f3743f3d24b8a816818cfcbe776fe%7C0%7C0%7C638821467062735681%7CUnknown%7CTWFpbGZsb3d8eyJFbXB0eU1hcGkiOnRydWUsIlYiOiIwLjAuMDAwMCIsIlAiOiJXaW4zMiIsIkFOIjoiTWFpbCIsIldUIjoyfQ%3D%3D%7C0%7C%7C%7C&sdata=LqHQ7N0PhNlDjUHq52Kn4AfgzV8uvFf7A2eXC%2BLlfqY%3D&reserved=0
> ",
> >
> >                "property.preferredleader":"true"
> >
> >              }
> >
> >            },
> >
> >            "state":"active",
> >
> >            "health":"ORANGE"
> >
> >          }
> >
> >
> >
> >
> > Unless expressly stated otherwise in this email, this e-mail is sent on
> behalf of Auto Trader Limited Registered Office: 1 Tony Wilson Place,
> Manchester, Lancashire, M15 4FN (Registered in England No. 03909628). Auto
> Trader Limited is part of the Auto Trader Group Plc group. This email and
> any files transmitted with it are confidential and may be legally
> privileged, and intended solely for the use of the individual or entity to
> whom they are addressed. If you have received this email in error please
> notify the sender. This email message has been swept for the presence of
> computer viruses.
>
>
>
> Unless expressly stated otherwise in this email, this e-mail is sent on
> behalf of Auto Trader Limited Registered Office: 1 Tony Wilson Place,
> Manchester, Lancashire, M15 4FN (Registered in England No. 03909628). Auto
> Trader Limited is part of the Auto Trader Group Plc group. This email and
> any files transmitted with it are confidential and may be legally
> privileged, and intended solely for the use of the individual or entity to
> whom they are addressed. If you have received this email in error please
> notify the sender. This email message has been swept for the presence of
> computer viruses.
>

Re: Recovery from disk failure

Reply via email to