Jordan Peck <[email protected]> writes:

> This patch saves disk space on supported filesystems for byte-identical file
> installation by utilising filesystem clone APIs.
>
> The main benefit is reduced disk space usage during checkout, update, and
> other working-copy file installs on filesystems that support copy-on-write/
> block cloning. When the working file is byte-identical to the pristine file,
> Subversion can install it by copying or reflinking from the pristine instead
> of storing two independent physical copies of the same bytes.

I'm not entirely against this, but in its current form the patch adds
a lot of complex machinery for fairly situational gains (for example,
when using a non-default file system such as ReFS on Windows).
It also isn't universally applicable to all operations and risks
several regressions.

Also, the --store-pristine=no option already provides a consistent size
reduction on all file systems and OSes, without disabling the streaminess,
regressing other characteristics or requiring any kind of in-advance probing.

More specifically on some of the above points:

- The optimization is far from universally applicable.  For example, it
  doesn't cover the default file system on Windows (NTFS).

- The required probing is too expensive to run in every scope where it
  would be needed.

  For example, probing on Windows calls GetVolumeInformation().  Probing
  on other OSes has to create a file to check whether copy-on-write is
  supported.

  The patch limits this probing only for larger scopes, such as attaching it
  to an update editor instance, but smaller-scoped operations like workqueue
  installs or file reverts are left without probing.  If not all code paths
  use the copy-on-write path, the size reduction may not be sustainable or
  predictable in the long run.

- Skipping the streamy checkout path can potentially regress performance and
  cause spurious HTTP timeouts and failures.  To some extent, this is a step
  back from what we currently have on trunk.

- The new copy_file_to_temp_copyfile_windows() appears to make significantly
  more syscalls than the current code on trunk.  It opens and closes a temp
  file, performs the copy, and changes the file attributes and mtime, all
  with separate path-based calls, which are very expensive on Windows.

  Compared to that, the current code on trunk does all of that through a
  single handle, which is also better from an atomicity perspective.

- APIs like CopyFile2() seem to require an actual file to copy from, so
  this rules out using the optimization with any form of compressed or
  db-stored pristines without first unpacking them to a temp file, which
  would itself result in a different regression.


Regards,
Evgeny Kotkov

Reply via email to