Julian Foad wrote on Tue, Jun 29, 2021 at 11:32:24 +0100: > Daniel Shahaf wrote: > > Julian Foad wrote on Fri, Jun 25, 2021 at 14:27:30 +0100: > There could be two angles to view this issue. One angle is looking at > the specifics of why WD's replicator wants different behaviour, which > I will go into further below. > > Another angle could be looking from a software design point of view > and saying of course code like this should clean up like this. I am > hopeful that someone may be able to see it from this angle and say of > course this change is right in principle, and maybe point out if I got > the specifics right or missed something. >
I'm viewing this from the latter angle: as a matter of libsvn_fs_fs's behaviour, viewed as a single, reusable component. WD's needs are interesting as context but are not to be considered design constraints on potential fixes. (For one, I suspect designing an API to WD's needs would risk ASF's 501(c)3 status.) > > > DESIRED BEHAVIOUR > > > > > > The commit re-try should fail in exactly the same way as the first try, > > > if the underlying cause of the error persists. > > > > > > Or, if the underlying cause of the error was transient and has gone away, > > > then the commit should succeed. > > > > I think it's possible that the second try would legitimately fail with > > a different error message than the first try. > > Yes, that remains a third possibility, where the underlying cause of > the error changed. The problematic cases are the other two. > > In the case of a persistent underlying error condition, as best I > understand, the concern is that the replicator should be able to > report the error condition consistently. Why does it matter? Remember ⋮ > Bear in mind that the replicator does not know about the specifics of > all the possible svn error codes, it needs to apply this logic > uniformly to all error responses (and success responses) from all > APIs. I decline to respond as that would constitute design discussion of non-FLOSS product(s). > > Thus, it's not clear what the desired _change_ in behaviour is. > > Is that clearer now? > No. > > > PROPOSED FIX > > > > > > My proposed fix is to unlock the lock when exiting that code section, no > > > matter whether successful or throwing an error. > > > > > > Pseudo-Python: > > > > > > try: > > > 1. Get a write handle on the proto revision file. > > > 2. Complete the construction of the proto-rev file. > > > 3. Move the finished rev file into place. > > > finally: > > > 4. Unlock the proto-rev file. > > > 5. Continue with the rest of the commit finalization. > > > > > > > Why would this be correct? Why would this have no downside? > > Would it be easier to see from this angle: that would make the behaviour for > a re-try within one calling process equivalent to the existing behaviour that > would occur if the FSFS machine restarts and a new calling process re-tries. And are we sure that codepath is crash-safe?