Re: CASSANDRA-20490 Question about broken forced ephemeral snapshots during repair

Štefan Miklošovič Mon, 12 May 2025 23:58:26 -0700

David,

that "force" you copied from nodetool, that has nothing to do with "force"
in these snapshots but I think you already know that.


org.apache.cassandra.repair.RepairJobDesc#determanisticId is only in 4.1+
included, not in 4.0. That's quite a bummer but we could limit the patch
for 4.1+ only and leave 4.0 be.

When we do that, then "force" will become redundant and we can perhaps
remove it?

How I look at it is that the introduction of "force" flag in
TableRepairManager is the result of a developer trying to consolidate the
name of snapshots from parent id / session id to both parent id / parent id
but there was the realisation of that being problematic because else branch
might be invoked multiple times so it would start to clash. So they said
"ok and here we just override and take no matter what" but the
implementation of that was not finished as it clearly fails.

So if we moved to determanisticId for both branches of if / else and remove
force, we should be good? RepairJobDesc#deterministicId is basically
unique? I see it is created from parent id, session id, keyspace, table and
ranges. I do not think there would ever be two cases of snapshots like that
with exact same values.

WDYT?

Regards

On Mon, May 12, 2025 at 6:55 PM David Capwell <[email protected]> wrote:

> > "force" can be true / false. When true, then it will not check if a
> snapshot exists
>
>
> My mental node for “force” was only to deal with down nodes, and nothing
> to do with snapshots… so this feels like a weird behavior
>
> @Option(title = "force", name = {"-force", "--force"}, description = "Use
> -force to filter out down endpoints”)
>
> Our documentation even only calls out dealing with downed enodes…
>
>
> >   a) we will remove existing snapshot and take a new one
>
> So this is racy and can make repairs brittle.  The snapshot is at the
> RepairSession (global) or RepairJob level (a subset of ranges for a single
> table, different RepairJobs can work on the same table, just different
> ranges). With this code path you also have 1 more variable at play that
> makes things complex: isGlobal
>
> public boolean isGlobal() {return dataCenters.isEmpty() &&
> hosts.isEmpty();}
>
> Global does the full range, where as non-global covers the partial range
>
> So when you do host or dc this snapshot is isolated to the RepairJob and
> not the session logically, but physically its isolated to the session;
> which doesn’t make any sense.
>
> > 2) if we do not want to fix "force", then it is most probably just
> redundant and we would remove it but it order to not fail, we would need to
> go back to naming snapshots not parent id / parent id but parent id /
> session id for global and non-global repair respectively (basically we
> would return to pre-14116 behavior).
>
> org.apache.cassandra.repair.RepairJobDesc#determanisticId
>
> This is scoped to a single RepairJob, so wouldn’t have the issues with
> concurrency.  So if we were going to alter what the snapshot name is, I
> would strongly prefer this (its also present in the repair vtable, so not
> some hidden UUID)
>
> I personally feel the lowest risk is to switch to the job name and away
> from the session name we had… I do wonder about removing this “force”
> semantic as its not documented to users as far as I can tell, so is there
> any place that defines this behavior?
>
> > we might just fix the trunk and keep the rest broken, not ideal but
>
> With some patches its hard to tell if something is a bug or a feature… so
> altering the semantic is that making an improvement to allow better
> combinations of repair options, or is it fixing a bug (aka worthy of a back
> port)…. I know most users do parallel repair (default) so this snapshot
> logic doesn’t impact most users, but its also why its been so long before
> this issue got noticed… so coverage around snapshots currently isn’t the
> best.
>
> Sorry for this reply, its less of an answer and more a dump while I think
> through the problem you stated….
>
> > On May 12, 2025, at 9:18 AM, Štefan Miklošovič <[email protected]>
> wrote:
> >
> > Hello,
> >
> > I am looking for advice for (1). There is quite a quality archaeology to
> go through. I have asked around already (thanks for that!) but nobody seems
> to ultimately remember / tell what it is for.
> >
> > There is (2). "force" can be true / false. When true, then it will not
> check if a snapshot exists. If it exists and "force" is true, this has
> never worked - that is what CASSANDRA-20490 is for. It would try to hard
> link twice and it would error out.
> >
> > This "force" logic is called here (3). If ParentRepairSession (prs)  is
> global, "force" is false, but if it is not (not global means that some dc
> was specified), then it is true.
> >
> > Notice that both branches of that if/else use
> desc.parentSessionId.toString() as snapshots' name.
> >
> > This "force" logic was introduced in (4) but it seems to me that it
> never worked. AFAICT this was just broken on delivery.
> >
> > One commit before (4), it looked like (5). The first branch of "if" used
> parentSessionId as a snapshot name and it checked if it exists or not
> before taking it, the else branch took sessionId (not parent session id)
> and took a snapshot of that name without any check if it exists or not.
> >
> > From javadoc on TableRepairManager, "force" means: (introduced in (4)
> CASSANDRA-14116)
> >
> > "If force is true, a snapshot should be taken even if one already exists
> with that name.".
> >
> > Questions:
> >
> > 1) do we want to fix the behaviour of "force"? (patch 20490)
> >   if the answer is yes, what is the "semantics" about this. What does
> "force" mean exactly? It can mean
> >   a) we will remove existing snapshot and take a new one
> >   b) we will add more SSTables onto existing snapshot, we just hardlink
> more SSTables into same snapshot (that is what 20490 is doing)
> >
> > 2) if we do not want to fix "force", then it is most probably just
> redundant and we would remove it but it order to not fail, we would need to
> go back to naming snapshots not parent id / parent id but parent id /
> session id for global and non-global repair respectively (basically we
> would return to pre-14116 behavior).
> >
> > I think the introduction of "force" flag was there because there might
> be multiple calls to "else" branch in RepairMessageVerbHandler and calling
> it all over again with parentSessionId would be erroneous when already
> there, so "force" was there for not caring and just taking a snapshot
> again. If we go back to sessionId only, then we do not have this problem I
> guess.
> >
> > Is it important for people to keep the name of a snapshot in both cases
> (global / not global) equal to parentSessionId? Can we just go back?
> >
> > If we want to keep naming snapshots as done now, we might just do 20490,
> I was trying to fix older branches as well but it is very difficult, we
> might just fix the trunk and keep the rest broken, not ideal but ... I am
> just reinventing a wheel in 4.0+, it is just too cumbersome / complex to
> copy trunk's behaviour in 20490 to older stuff.
> >
> > Thanks and regards
> >
> > (1) https://issues.apache.org/jira/browse/CASSANDRA-20490
> > (2)
> https://github.com/apache/cassandra/blob/trunk/src/java/org/apache/cassandra/db/repair/CassandraTableRepairManager.java#L84
> > (3)
> https://github.com/apache/cassandra/blob/trunk/src/java/org/apache/cassandra/repair/RepairMessageVerbHandler.java#L185-L192
> > (4) https://issues.apache.org/jira/browse/CASSANDRA-14116
> > (5)
> https://github.com/apache/cassandra/blob/478c1a9fdf027af0f373f9e26521803ae8775fdb/src/java/org/apache/cassandra/repair/RepairMessageVerbHandler.java#L106-L121
>
>

Re: CASSANDRA-20490 Question about broken forced ephemeral snapshots during repair

Reply via email to