Daniel Shahaf wrote on Tue, 06 Jul 2021 01:23 +00:00: > Julian Foad wrote on Tue, Jun 29, 2021 at 11:32:24 +0100: > > Daniel Shahaf wrote: > > > Julian Foad wrote on Fri, Jun 25, 2021 at 14:27:30 +0100: > > There could be two angles to view this issue. One angle is looking at > > the specifics of why WD's replicator wants different behaviour, which > > I will go into further below. > > > > Another angle could be looking from a software design point of view > > and saying of course code like this should clean up like this. I am > > hopeful that someone may be able to see it from this angle and say of > > course this change is right in principle, and maybe point out if I got > > the specifics right or missed something. > > > > I'm viewing this from the latter angle: as a matter of libsvn_fs_fs's > behaviour, viewed as a single, reusable component. WD's needs are > interesting as context but are not to be considered design constraints > on potential fixes. (For one, I suspect designing an API to WD's needs > would risk ASF's 501(c)3 status.) > > > > > DESIRED BEHAVIOUR > > > > > > > > The commit re-try should fail in exactly the same way as the first try, > > > > if the underlying cause of the error persists. > > > > > > > > Or, if the underlying cause of the error was transient and has gone > > > > away, then the commit should succeed. > > > > > > I think it's possible that the second try would legitimately fail with > > > a different error message than the first try. > > > > Yes, that remains a third possibility, where the underlying cause of > > the error changed. The problematic cases are the other two. > > > > In the case of a persistent underlying error condition, as best I > > understand, the concern is that the replicator should be able to > > report the error condition consistently. Why does it matter? Remember > ⋮ > > Bear in mind that the replicator does not know about the specifics of > > all the possible svn error codes, it needs to apply this logic > > uniformly to all error responses (and success responses) from all > > APIs. > > I decline to respond as that would constitute design discussion of > non-FLOSS product(s).
…and as such, would be off-topic for this list, being an ASF list. > > > Thus, it's not clear what the desired _change_ in behaviour is. > > > > Is that clearer now? > > > > No. > > > > > PROPOSED FIX > > > > > > > > My proposed fix is to unlock the lock when exiting that code section, > > > > no matter whether successful or throwing an error. > > > > > > > > Pseudo-Python: > > > > > > > > try: > > > > 1. Get a write handle on the proto revision file. > > > > 2. Complete the construction of the proto-rev file. > > > > 3. Move the finished rev file into place. > > > > finally: > > > > 4. Unlock the proto-rev file. > > > > 5. Continue with the rest of the commit finalization. > > > > > > > > > > Why would this be correct? Why would this have no downside? > > > > Would it be easier to see from this angle: that would make the behaviour > > for a re-try within one calling process equivalent to the existing > > behaviour that would occur if the FSFS machine restarts and a new calling > > process re-tries. > > And are we sure that codepath is crash-safe? >