Jordan Peck <[email protected]> writes: > This patch saves disk space on supported filesystems for byte-identical file > installation by utilising filesystem clone APIs. > > The main benefit is reduced disk space usage during checkout, update, and > other working-copy file installs on filesystems that support copy-on-write/ > block cloning. When the working file is byte-identical to the pristine file, > Subversion can install it by copying or reflinking from the pristine instead > of storing two independent physical copies of the same bytes.
I'm not entirely against this, but in its current form the patch adds a lot of complex machinery for fairly situational gains (for example, when using a non-default file system such as ReFS on Windows). It also isn't universally applicable to all operations and risks several regressions. Also, the --store-pristine=no option already provides a consistent size reduction on all file systems and OSes, without disabling the streaminess, regressing other characteristics or requiring any kind of in-advance probing. More specifically on some of the above points: - The optimization is far from universally applicable. For example, it doesn't cover the default file system on Windows (NTFS). - The required probing is too expensive to run in every scope where it would be needed. For example, probing on Windows calls GetVolumeInformation(). Probing on other OSes has to create a file to check whether copy-on-write is supported. The patch limits this probing only for larger scopes, such as attaching it to an update editor instance, but smaller-scoped operations like workqueue installs or file reverts are left without probing. If not all code paths use the copy-on-write path, the size reduction may not be sustainable or predictable in the long run. - Skipping the streamy checkout path can potentially regress performance and cause spurious HTTP timeouts and failures. To some extent, this is a step back from what we currently have on trunk. - The new copy_file_to_temp_copyfile_windows() appears to make significantly more syscalls than the current code on trunk. It opens and closes a temp file, performs the copy, and changes the file attributes and mtime, all with separate path-based calls, which are very expensive on Windows. Compared to that, the current code on trunk does all of that through a single handle, which is also better from an atomicity perspective. - APIs like CopyFile2() seem to require an actual file to copy from, so this rules out using the optimization with any form of compressed or db-stored pristines without first unpacking them to a temp file, which would itself result in a different regression. Regards, Evgeny Kotkov

