On 24/04/2023 17:26, Christian Franke via Cygwin-apps wrote:
Jon Turney via Cygwin-apps wrote:
Detect filename collisions between packages
Don't check filenames under /etc/postinstall/ for collisions
Report when filename collisions exist
Add option '--collisions' to enable

IMO a useful enhancement.

:)

Notes:

Reading file catalog from a package is moderately expensive in terms of
I/O: To extract all the filenames from a tar archive, we need to seek to
every file header, and to seek forward through a compressed file, we
must examine every intervening byte to decompress it.

This adds a fourth(!) pass through each archive (one to checksum it, one
to extract files, another one (I added in dbfd1a64 without thinking too
deeply about it) to extract symlinks), and now one to check for filename
collisions).

Using std::set_intersection() on values from std::map() here is probably
a mistake. It's simple to write, but the performance is not good.

A faster alternative which avoids set_intersection calls in a loop is possibly to use one large data structure which maps filenames to sets of packages. Using multimap<string, string> instead of the straightforward map<string, set<string>> needs possibly less memory (not tested). But for multimap it is required that file/package name pairs are not inserted twice.

I attached a small standalone POC source file using multimap. It would also detect collisions in the already installed packages.

Thanks for the ideas. It seems I really didn't think that carefully about this...

It seems like maybe building a map from filename to the set of package names which contain it, and then finding all the filenames where that set has more than one member would be a possible better implementation.

[...]
Is the new file filemanifest.h required at all? It could be reduced to the following in install.cc:

#include <map>
...
typedef std::map<std::string, std::string> FileManifest;
// or more modern (C++11):
// using FileManifest = std::map<std::string, std::string>;

I think I had some idea to put the (de)serialization of the file manifests for installed packages into that class as well, but never got around to it (these need to be considered in the collision assessment as well as newly installed packages)

Reply via email to