On Fri, Feb 24, 2017 at 01:03:09PM -0500, Mark Phippard wrote: > Note that while this does fix the error, but because of the sha1 storage > sharing in the working copy you actually do not get the correct files. > Both PDF's wind up being the same file, I imagine whichever one you receive > first is the one you get. > > So not only does rep sharing need to be fixed, the WC pristine storage is > also broken by this.
Yes, indeed. I believe we should prepare a new working format for 1.10.0 which addresses this problem. I don't see a good way of fixing it without a format bump. The bright side of this is that it gives us a good reason to get 1.10.0 ready ASAP. We can switch to a better hash algorithm with a WC format bump. If we are willing to dispose of de-duplication in the pristine store we could make the pristine store future proof by adding a "salt" to each row in the pristine table. Say 64 bytes of data prepended to file content, which are random but stay fixed throughout the lifetime of a pristine. This way, there are 64 bytes of data not controlled by repository content which affect the hash algorithm's result before data from repository content gets mixed in. Now hash collisions in repository content become much less of a problem for the working copy. However, the pristine store would stop de-duplicating content. So perhaps this is not the best approach. The rep-cache uses hashes only for de-duplication so it very much relies on hash collisions being negligible. We should upgrade the hashing algorithm in a way that 'svnadmin upgrade' can take care of (for new revisions). Perhaps we should disable the feature by default in a 1.9.x patch release and advise users to turn it off until they can upgrade to 1.10. We might have to give up on ra_serf's approach of avoiding retransmissions of content which is already stored in the pristine store. This is now just as broken as the rep-cache is. We might be able to salvage it for future clients, but we should probably send multiple hashes and make it as easy as possible to add newer hash algorithms in future versions without disturbing older clients. Perhaps as a first step we should just disable this feature?