On Sat, Dec 7, 2024 at 11:14 PM Johan Corveleyn <jcor...@gmail.com> wrote:

> On Fri, Dec 6, 2024 at 12:27 PM Nikola Dipanov <
> ndipa...@hudson-trading.com> wrote:
> >
> > Hi all,
> >
> > A quick summary first: we’ve found that for certain use cases -
> increasing SVN_DELTA_WINDOW_SIZE  yields very significant storage savings
> on the size of the repo (~10x). We have a POC patch to make it configurable
> (currently only via fsfs.conf and using libsvn_ra_svn and libsvn_ra_local)
> and would be interested in working with the community to see if the changes
> could be improved, with the ultimate goal to have them accepted into the
> mainline.
> >
> > I have found some previous discussions on this from a good while back:
> https://svn.haxx.se/users/archive-2008-02/0547.shtml. but not much else.
> If there’s any additional information on this problem that I may have
> missed - please feel free to educate me!
> >
> > Otherwise if there are no glaring issues that anyone can see in
> supporting this - I’d be happy to look into cleaning up my POC patch and
> post it for detailed review.
> >
> > Obviously, there are trade-offs in doing this: A repo would most likely
> need to use a specified window size throughout its lifetime, and would not
> be beneficial for every use-case. This is why we propose to keep it as a
> config option. Some more details of our use-case and some rough numbers
> that illustrate the benefits follow.
> >
> > Our use case is that we commonly have very large files that see small
> changes over time, and a large xdelta window size would benefit such
> use-cases greatly. For us - the growth pace of our repositories is quite
> staggering due to this amplification, and we’d be more than happy to trade
> memory usage (especially, but also processing time to an extent) to be able
> to keep this in check. We’ve not done extensive measurements on how this
> impacts runtime yet.
> >
> > To demonstrate the effect - we generate a random and fairly large (~1.3
> G, ~30 million lines) XML file, commit it, then make random changes to it
> (from 300 lines to ~1% of lines), committed those, and then looked at the
> file size generated by the commit in repo/db/revs/.  We then repeat this,
> but with configuring a 100x larger window size (10240000). The most
> dramatic results are for small changes (this is somewhat intuitive) where
> the size of the revision with changes is ~40x smaller (10k vs 430k for a
> 1.3Gig file) when using a larger window size. For different patterns of
> changes the difference is not that dramatic but still large (5-10x). I am
> happy to share more details if people are interested.
> >
> > We believe we’d see a lot of benefit from this option, as would others
> in the community, and are very much committed to Subversion in the long
> run, so would love to hear what people think about something like this.
>
> That sounds quite interesting. However, I'm a bit worried about
> compatibility.
>
> The thread you metioned has one reply [1] which points to another thread
> about a possible backward compatibility issue [2].
>
> In that thread, Daniel Berlin wrote:
> > I also think we should up our default window size, but due to some
> > silliness in how this is currently implemented, changing the window size
> > is backwards incompatible!
> >
> > (This is because the code currently relies on knowing whether there are
> > more windows waiting to consume by comparing the window length against
> > the default window size, instead of by seeing if there is an EOF :( )
>
> This is confirmed by Branko Čibej in another reply [3].
>
> It's not clear to me whether those limitations still hold, but assuming
> they do there are some possible issues:
>

Hi Johan - thanks for a quick response first of all!

I believe the limitation mentioned in the email still holds. I believe this
is referring to [1]. However in our testing iirc, clients built with a
hardcoded, larger window size had no issues cloning a repo with a smaller
(aka default) window size. I will confirm what is actually going on here.

- fsfs.conf settings can be changed at any time during the lifetime of the
> repository. So the first 1000 revisions may have a different window size
> than later ones. The code reading those revisions cannot handle that (if I
> interpret the above threads correctly).
>
> At least in our current POC - the assumption is that this would not change
throughout the lifetime of the repository. Is this a problem in practice?
If an older client attempted to do operations on a repo with a version
higher AND a non-default window size it would fail to do so.

- A vanilla SVN server (or a client accessing the repository with file://)
> should at least be able to read the repository, even it was constructed
> with a different window size. Here too I fear this won't work (even if the
> window size would be a "immutable setting at creation time").
>
>
Yes this would be potentially a problem but only for the repositories that
do have a modified, non-default window size value. However let me
experiment a bit more with this and confirm soon.


> It all hinges on whether or not the current FSFS code can read data
> compressed with different window sizes (different from its own setting).
> Maybe things have changed since 2008 (there were several new versions of
> the fsfs format since then), maybe they haven't.
>
>
>From fsfs POV - the POC we have currently proposes to store the window size
(if modified, or defaulted to DELTA_WINDOW_SIZE) on svn_fs_t struct as a
new member (and the code is changed to use that and thread it where
necessary) so *I think* this would work but the caveat that it has to be
constant for the lifetime of the repository still holds.


> Perhaps you can perform some experiments to test for the above issues, as
> a form of further exploration.
>
>
I will revert back in a few days after digging some more into compatibility
questions, especially with code built without the proposed changes! Thanks
again for looking into this!

Best,
Nikola

[1]
https://github.com/apache/subversion/blob/b2ae6efaeeda78383961ae13151574a6df58734c/subversion/libsvn_delta/text_delta.c#L369

Reply via email to