Hi all, With the recently added support for LZ4 compression (r1801940 et al), we now have an option of using it by default for the on-disk data and over the wire.
For those who haven't been following this topic, here's a quick recap: - Currently, our default compression algorithm is zlib. - LZ4 offers much faster compression and decompression speed than zlib and includes a heuristic to skip incompressible data. - LZ4 has worse compression ratio than zlib-5 (our current default). In this dimension, it is more or less comparable with the compression ratio of zlib-1, although zlib-1 still has a slightly better compression ratio. See https://quixdb.github.io/squash-benchmark/ for additional information on this (the codecs to compare are "lz4 - 7" and "zlib - 1"). - Only the new filesystem format 8 allows using LZ4 for the on-disk data. - Using LZ4 over the wire requires both endpoints to advertise that they know how to deal with the new svndiff2 format that allows LZ4 compression. There are two questions to consider: (1) Do we want to start using LZ4 compression over the wire by default? If yes, do we want this default to apply to all installations or to only affect part of the installations where it makes sense? (2) Do we want to switch to the LZ4 compression for the on-disk data by default? I propose the following approach. Please note that for the wire format part, it only considers the http:// protocol, but we can optionally adjust svn:// later: (A) For the HTTP wire format, we start using LZ4 compression by default, but only over local networks. The reasoning behind this is that we probably wouldn't want to start always using LZ4 compression, as that would result in a regression over WAN, where the better compression ratio is usually preferable to the compression performance. Another point is that even for local networks we cannot disable compression altogether, because for slow 10 or even 100 Mbps LANs, where the throughput is limited by the slow network, using fast compression can be better than no compression. This is where LZ4 comes to the rescue by offering reasonable compression ratio and fast compression speed. This approach is currently implemented with the http-compression=auto client-side configuration option (r1803899), which is the new default. While the HTTP client is generally in charge of the used compression algorithm, there's also a way to override its preference on the server. If the mod_dav_svn's SVNCompressionLevel directive is set to 1, a server would then override the client's preference and still send svndiff2 / LZ4 data if the client can accept it. (B) For the on-disk data, we start using LZ4 compression by default (in format 8 repositories). The reasoning behind this is that currently, zlib compression is a hotspot that can limit the performance of both read and write operations on the repository. It also affects how well Subversion works when dealing with large and, possibly, incompressible files (and I tend to think that it's a fairly important use case). Switching to a faster compression algorithm that is also used by other various file system implementations should improve the performance of such operations in a visible way. Please note that this change is a trade-off between the compression ratio and speed of the operations. The repositories using LZ4 compression would require a bit more disk space. The amount of the required additional space is proportional to the difference between the compression ratio of LZ4 and zlib-5, which can be roughly estimated as around 30-35% for compressible binary and text files, although that may vary depending on the actual data. To illustrate how these changes will affect the speed of some of the operations, the 'svn import' of a 2 GB file over HTTP on LAN in my environment takes 18 seconds instead of 63 seconds. How does this sound? Are there any objections or suggestions to the proposed approach? (Please note that most of the implementation is already in place, and to get the described behavior we would just have to change a couple of default settings.) Thanks, Evgeny Kotkov