On 24/04/2023 17:26, Christian Franke via Cygwin-apps wrote:
Jon Turney via Cygwin-apps wrote:
Detect filename collisions between packages
Don't check filenames under /etc/postinstall/ for collisions
Report when filename collisions exist
Add option '--collisions' to enable
IMO a useful enhancement.
:)
Notes:
Reading file catalog from a package is moderately expensive in terms of
I/O: To extract all the filenames from a tar archive, we need to seek to
every file header, and to seek forward through a compressed file, we
must examine every intervening byte to decompress it.
This adds a fourth(!) pass through each archive (one to checksum it, one
to extract files, another one (I added in dbfd1a64 without thinking too
deeply about it) to extract symlinks), and now one to check for filename
collisions).
Using std::set_intersection() on values from std::map() here is probably
a mistake. It's simple to write, but the performance is not good.
A faster alternative which avoids set_intersection calls in a loop is
possibly to use one large data structure which maps filenames to sets of
packages. Using multimap<string, string> instead of the straightforward
map<string, set<string>> needs possibly less memory (not tested). But
for multimap it is required that file/package name pairs are not
inserted twice.
I attached a small standalone POC source file using multimap. It would
also detect collisions in the already installed packages.
Thanks for the ideas. It seems I really didn't think that carefully
about this...
It seems like maybe building a map from filename to the set of package
names which contain it, and then finding all the filenames where that
set has more than one member would be a possible better implementation.
[...]
Is the new file filemanifest.h required at all? It could be reduced to
the following in install.cc:
#include <map>
...
typedef std::map<std::string, std::string> FileManifest;
// or more modern (C++11):
// using FileManifest = std::map<std::string, std::string>;
I think I had some idea to put the (de)serialization of the file
manifests for installed packages into that class as well, but never got
around to it (these need to be considered in the collision assessment as
well as newly installed packages)