Re: SVN-4877 FSFS commit failure should release txn proto-rev lock

Daniel Shahaf Mon, 05 Jul 2021 19:25:15 -0700

Daniel Shahaf wrote on Tue, 06 Jul 2021 01:23 +00:00:
> Julian Foad wrote on Tue, Jun 29, 2021 at 11:32:24 +0100:
> > Daniel Shahaf wrote:
> > > Julian Foad wrote on Fri, Jun 25, 2021 at 14:27:30 +0100:
> > There could be two angles to view this issue. One angle is looking at
> > the specifics of why WD's replicator wants different behaviour, which
> > I will go into further below.
> > 
> > Another angle could be looking from a software design point of view
> > and saying of course code like this should clean up like this. I am
> > hopeful that someone may be able to see it from this angle and say of
> > course this change is right in principle, and maybe point out if I got
> > the specifics right or missed something.
> > 
> 
> I'm viewing this from the latter angle: as a matter of libsvn_fs_fs's
> behaviour, viewed as a single, reusable component.  WD's needs are
> interesting as context but are not to be considered design constraints
> on potential fixes.  (For one, I suspect designing an API to WD's needs
> would risk ASF's 501(c)3 status.)
> 
> > > > DESIRED BEHAVIOUR
> > > > 
> > > > The commit re-try should fail in exactly the same way as the first try, 
> > > > if the underlying cause of the error persists.
> > > > 
> > > > Or, if the underlying cause of the error was transient and has gone 
> > > > away, then the commit should succeed.
> > > 
> > > I think it's possible that the second try would legitimately fail with
> > > a different error message than the first try.
> > 
> > Yes, that remains a third possibility, where the underlying cause of
> > the error changed. The problematic cases are the other two.
> > 
> > In the case of a persistent underlying error condition, as best I
> > understand, the concern is that the replicator should be able to
> > report the error condition consistently. Why does it matter? Remember
> ⋮
> > Bear in mind that the replicator does not know about the specifics of
> > all the possible svn error codes, it needs to apply this logic
> > uniformly to all error responses (and success responses) from all
> > APIs.
> 
> I decline to respond as that would constitute design discussion of
> non-FLOSS product(s).


…and as such, would be off-topic for this list, being an ASF list.

> > > Thus, it's not clear what the desired _change_ in behaviour is.
> > 
> > Is that clearer now?
> > 
> 
> No.
> 
> > > > PROPOSED FIX
> > > > 
> > > > My proposed fix is to unlock the lock when exiting that code section, 
> > > > no matter whether successful or throwing an error.
> > > > 
> > > > Pseudo-Python:
> > > > 
> > > >     try:
> > > >         1. Get a write handle on the proto revision file.
> > > >         2. Complete the construction of the proto-rev file.
> > > >         3. Move the finished rev file into place.
> > > >     finally:
> > > >         4. Unlock the proto-rev file.
> > > >     5. Continue with the rest of the commit finalization.
> > > > 
> > > 
> > > Why would this be correct?  Why would this have no downside?
> > 
> > Would it be easier to see from this angle: that would make the behaviour 
> > for a re-try within one calling process equivalent to the existing 
> > behaviour that would occur if the FSFS machine restarts and a new calling 
> > process re-tries.
> 
> And are we sure that codepath is crash-safe?
>

Re: SVN-4877 FSFS commit failure should release txn proto-rev lock

Reply via email to