On Fri, Jan 20, 2023 at 7:18 AM Daniel Shahaf <d...@daniel.shahaf.name> wrote:
>
> Evgeny Kotkov via dev wrote on Thu, 19 Jan 2023 18:52 +00:00:
> > I can complete the work on this branch and bring it to a production-ready
> > state, assuming there are no objections.
>
> Your assumption is counterfactual:
>
> https://mail-archives.apache.org/mod_mbox/subversion-dev/202301.mbox/%3C20230119152001.GA27446%40tarpaulin.shahaf.local2%3E
>
> https://mail-archives.apache.org/mod_mbox/subversion-dev/202212.mbox/%3CCAMHy98NqYBLZaTL5-FAbf24RR6bagPN1npC5gsZenewZb0-EuQ%40mail.gmail.com%3E
>
> Objections have been raised, been left unanswered, and now
> implementation work has commenced following the original design.  That's
> not acceptable.  I'm vetoing the change until a non-rubber-stamp design
> discussion has been completed on the public dev@ list.


I think we can start by discussing some of the pros and cons.

There are two separate things here but they end up being mixed
together in the discussions:

1. Pros/cons of switching from SHA1 to another hash.
2. Supporting different hash types in f32.

Regarding the first item:

Do we need to switch from SHA1 to another hash? One con that was
already mentioned [1] is that we'll never really be able to switch
away from SHA1, as there are existing clients, servers, and working
copies out there. Not only will we have to support SHA1 forever for
backwards compatibility, but any new hash that is ever added will need
to be supported forever as well. If we accumulate many of those, it
might become a burden, but perhaps there will be only one new hash and
it will be the "blessed" one for the next 20 years.

There were concerns about collisions; since the space of possible
input datasets is infinite and the hash code size is fixed and finite
(pretty large, but very much finite), there will always be collisions
with any hash. The significant questions are: how small is the
probability of a collision, and (for the purposes of security) how
hard is it to generate input data that produces a collision? The
answer to the first question is fixed; the second one is probably
expected to change over time, as algorithms are studied and new
vulnerabilities are found. Which hash type do you pick, and who knows
if a hash thought to be very strong (today) later proves easier to
crack than one that is thought not as strong? We can only guess.

Taking a step back, this discussion started because pristine-free WCs
are IIUC more dependent on comparing hashes than pristineful WCs, and
therefore a hash collision could have more impact in a pristine-free
WC. "Guarantees" were mentioned, but I think it's important to state
that there's only a guarantee of probability, since as mentioned above
all hashes will have collisions.

We already can't store files with identical SHA1 hashes, but AFAIK the
only meaningful impact we've ever heard is that security researchers
cannot track files they generate with deliberate collisions. The same
would be true with any hash type, for collisions within that hash
type.

Advantages of switching to a new hash type might include: reducing the
already small probability of collisions; choosing an algorithm that is
faster or that has (or is expected to have in the future) hardware
acceleration on commodity systems, perhaps addressing user perception
(if SHA1 is seen as old and uncool), but then again, we can't really
get rid of SHA1...

[1] https://lists.apache.org/thread/v3dv1dtod2t9yrf920h4838g2t0l94cw

Regarding the second item:

Since the premise of this feature is to support adding new hash types
without bumping wc formats, it follows that any new hash type will
create compatibility problems for clients that support f32 but not the
specific new hash type. In light of that, it might just be better to
bump the wc format and then you know at the outset that you need to
upgrade your client. Just thinking out loud here but this might be
(partly) mitigated by trying to guess which hash types we might want
in the future and supporting them now, even if no existing client will
actually use them, but I don't really like this idea.

I'll have to return later with more thoughts...

Cheers,
Nathan

Reply via email to