Re: Implications of duplicate UUIDs on a server

Doug Robinson Wed, 30 Apr 2025 09:00:46 -0700

Andreas, et. al.:

On Wed, Apr 30, 2025 at 4:17 AM Andreas Stieger <andreas.stie...@gmx.de>
wrote:

>
> On 2025-04-29 18:53, LWChris wrote:
> > Therefore we suspect it was some kind of caching issue in the SVN
> > server (WANdisco) due to same path same UUID same commit number; after
> > restarting the SVN server, the issue went away. But I don't know if
> > WANdisco is a "typical server implementation", or if the issue was
> > something deeper, or if the issues are related at all or pure
> > coincidence, etc.
>
> WANdisco is not a typical Subversion server implementation. For
> synchronous multi-site replication (as opposed to svn sync which is
> asynchronous), it provides a proxy layer with a consensus protocol
> (think PAXOS, wsrep, raft). Notable the replication requires the
> representation of candidate transaction into a serialzied format that.
> It is conceivable that non-unique UUIDs may cause hick-ups, but I would
> not exclude the possibility that this is also a general problem of plain
> svn.
>

First, there are 2 flavors of WANdisco Subversion bits:

    1. "vanilla"
    2. "replicated"

The "vanilla" are exactly the unmodified Apache Subversion source
distribution compiled up, tested and packaged for each supported
OS flavor.  They are distributed free of charge to anyone and
everyone.  See here [0].

The "replicated" are EXACTLY a typical Subversion server implementation
up until the point of the actual transaction commit execution.
Nearly everything else is identical.  Specifically, the READ-ONLY
side of the server is COMPLETELY IDENTICAL to typical Subversion.
That includes ALL of the Apache caching.  Nothing that we do touches
the read-path - that was a critical part of the design principle.

In terms of the WANdisco "WRITE PATH", there are the normal Subversion
repo-UUIDs for the repos, but they play an almost negligible part
in the update process.  Each repository is associated with its own
"distributed state machine" (DSM) that has its own UUID (never
repeated) and it is that DSM that is specifically tasked making the
updates occur.  Many of our customers have a LOT of non-unique
repository UUIDs and I have never seen a single hiccup due to that
sort of issue from the perspective of repository updates.

I have not yet seen in this discussion any disclosure of the version
of Subversion nor Apache.  Could that information be added to the
conversation?  It makes a difference since in earlier versions of
Subversion there was only a single UUID for each repository; now
there are 2.  Part of that, IIRC, was to enable some better cache
invalidation so that {path,repo-UUID} was not the only distinguishing
factor since it was causing confusion in the long-lived Apache
process (I'm sure someone will correct me if I'm wrong).  So the
answer of "path-only" or "{path,repo-UUID}" or
"{path,repo-UUID,repo-UUID2}" is likely dependent on the version
of Subversion (or at least some of the conversations I've read in
the past have made it seem that way).

All of that said, the use case that was enumerated previously is
definitely broken in terms of Subversion itself (nothing to do with
WANdisco).  By creating the same repository on-disk in the same
path using the same repo-UUID and then populating it with even
remotely similar contents will definitely cause the Apache cache
to be confused.  The same thing will happen if you ever restore a
repository from backup.  When those types of operations occur you
must restart Apache in order to clear its cache.

Finally, to prevent any confusion going forward, WANdisco changed
its company name to Cirata in 2024.

Cheers.

Doug

[0] https://cirata.com/resources/support/subversion-binaries
-- 
*Doug Robinson*  Senior Product Manager
P +1 925 396 1125
*E* doug.robin...@cirata.com

-- 

THIS MESSAGE AND ANY ATTACHMENTS ARE CONFIDENTIAL, PROPRIETARY AND MAY 
BE PRIVILEGED

If this message was misdirected, Cirata Ltd. and its 
subsidiaries, ("Cirata") does not waive any confidentiality or privilege. 
If you are not the intended recipient, please notify us immediately and 
destroy the message without disclosing its contents to anyone. Any 
distribution, use or copying of this email or the information it contains 
by other than an intended recipient is unauthorized. The views and opinions 
expressed in this email message are the author's own and may not reflect 
the views and opinions of Cirata, unless the author is authorized by Cirata 
to express such views or opinions on its behalf. All email sent to or from 
this address is subject to electronic storage and review by Cirata. 
Although Cirata operates anti-virus programs, it does not accept 
responsibility for any damage whatsoever caused by viruses being passed.

Re: Implications of duplicate UUIDs on a server

Reply via email to