Re: SVN-4877 FSFS commit failure should release txn proto-rev lock

Daniel Shahaf Mon, 05 Jul 2021 18:24:09 -0700

Julian Foad wrote on Tue, Jun 29, 2021 at 11:32:24 +0100:
> Daniel Shahaf wrote:
> > Julian Foad wrote on Fri, Jun 25, 2021 at 14:27:30 +0100:
> There could be two angles to view this issue. One angle is looking at
> the specifics of why WD's replicator wants different behaviour, which
> I will go into further below.
> 
> Another angle could be looking from a software design point of view
> and saying of course code like this should clean up like this. I am
> hopeful that someone may be able to see it from this angle and say of
> course this change is right in principle, and maybe point out if I got
> the specifics right or missed something.
>


I'm viewing this from the latter angle: as a matter of libsvn_fs_fs's
behaviour, viewed as a single, reusable component.  WD's needs are
interesting as context but are not to be considered design constraints
on potential fixes.  (For one, I suspect designing an API to WD's needs
would risk ASF's 501(c)3 status.)

> > > DESIRED BEHAVIOUR
> > > 
> > > The commit re-try should fail in exactly the same way as the first try, 
> > > if the underlying cause of the error persists.
> > > 
> > > Or, if the underlying cause of the error was transient and has gone away, 
> > > then the commit should succeed.
> > 
> > I think it's possible that the second try would legitimately fail with
> > a different error message than the first try.
> 
> Yes, that remains a third possibility, where the underlying cause of
> the error changed. The problematic cases are the other two.
> 
> In the case of a persistent underlying error condition, as best I
> understand, the concern is that the replicator should be able to
> report the error condition consistently. Why does it matter? Remember
⋮
> Bear in mind that the replicator does not know about the specifics of
> all the possible svn error codes, it needs to apply this logic
> uniformly to all error responses (and success responses) from all
> APIs.

I decline to respond as that would constitute design discussion of
non-FLOSS product(s).

> > Thus, it's not clear what the desired _change_ in behaviour is.
> 
> Is that clearer now?
> 

No.

> > > PROPOSED FIX
> > > 
> > > My proposed fix is to unlock the lock when exiting that code section, no 
> > > matter whether successful or throwing an error.
> > > 
> > > Pseudo-Python:
> > > 
> > >     try:
> > >         1. Get a write handle on the proto revision file.
> > >         2. Complete the construction of the proto-rev file.
> > >         3. Move the finished rev file into place.
> > >     finally:
> > >         4. Unlock the proto-rev file.
> > >     5. Continue with the rest of the commit finalization.
> > > 
> > 
> > Why would this be correct?  Why would this have no downside?
> 
> Would it be easier to see from this angle: that would make the behaviour for 
> a re-try within one calling process equivalent to the existing behaviour that 
> would occur if the FSFS machine restarts and a new calling process re-tries.

And are we sure that codepath is crash-safe?

Re: SVN-4877 FSFS commit failure should release txn proto-rev lock

Reply via email to