Jon Turney via Cygwin-apps wrote:
Detect filename collisions between packages
Don't check filenames under /etc/postinstall/ for collisions
Report when filename collisions exist
Add option '--collisions' to enable

IMO a useful enhancement.


Notes:

Reading file catalog from a package is moderately expensive in terms of
I/O: To extract all the filenames from a tar archive, we need to seek to
every file header, and to seek forward through a compressed file, we
must examine every intervening byte to decompress it.

This adds a fourth(!) pass through each archive (one to checksum it, one
to extract files, another one (I added in dbfd1a64 without thinking too
deeply about it) to extract symlinks), and now one to check for filename
collisions).

Using std::set_intersection() on values from std::map() here is probably
a mistake. It's simple to write, but the performance is not good.

A faster alternative which avoids set_intersection calls in a loop is possibly to use one large data structure which maps filenames to sets of packages. Using multimap<string, string> instead of the straightforward map<string, set<string>> needs possibly less memory (not tested). But for multimap it is required that file/package name pairs are not inserted twice.

I attached a small standalone POC source file using multimap. It would also detect collisions in the already installed packages.


...
diff --git a/filemanifest.h b/filemanifest.h
new file mode 100644
index 0000000..cc6fb81
--- /dev/null
+++ b/filemanifest.h
@@ -0,0 +1,29 @@
+/*
+ * Copyright (c) 2022 Jon Turney
+ *
+ *     This program is free software; you can redistribute it and/or modify
+ *     it under the terms of the GNU General Public License as published by
+ *     the Free Software Foundation; either version 2 of the License, or
+ *     (at your option) any later version.
+ *
+ *     A copy of the GNU General Public License can be found at
+ *http://www.gnu.org/

"SPDX-License-Identifier: GPL-2.0-or-later" is possibly a more brief and modern way to say this in new code.


+ *
+ */
+
+/*
+  FileManifest
+
+  A serializable set of filenames
+  each filename is associated with the package name that owns it
+
+  Used when working with the install package file manifests in /etc/setup
+*/
+
+#include <string>
+#include <unordered_map>

This should be <map> as std::unordered_map is not used in the code.


+#include <algorithm>
+
+class FileManifest: public std::map<std::string, std::string>
+{
+};

Is the new file filemanifest.h required at all? It could be reduced to the following in install.cc:

#include <map>
...
typedef std::map<std::string, std::string> FileManifest;
// or more modern (C++11):
// using FileManifest = std::map<std::string, std::string>;


--
Regards,
Christian

#include <cstdio>
#include <map>
#include <string>

using FileManifest = std::multimap<std::string, std::string>;

int main()
{
  FileManifest manifest;

  // read from /etc/setup/*.lst.gz
  const FileManifest installed {
    {"/usr/bin/file11", "package1"}, // #1 package2
    {"/usr/bin/file12", "package1"}, // #2 package3
    {"/usr/bin/file21", "package2"},
    {"/usr/bin/file22", "package2"}, // #3 package3, package4
    {"/usr/bin/file11", "package2"}  // #1 package1
  };
  manifest.insert(installed.begin(), installed.end());

  // read from tar files to be installed
  const FileManifest install {
    {"/usr/bin/file31", "package3"},
    {"/usr/bin/file12", "package3"}, // #2 package1
    {"/usr/bin/file22", "package3"}, // #3 package2, package4
    {"/usr/bin/file41", "package4"},
    {"/usr/bin/file42", "package4"},
    {"/usr/bin/file22", "package4"}  // #3 package2, package3
  };
  manifest.insert(install.begin(), install.end());
  
  for (auto i = manifest.cbegin(), end = manifest.cend(); i != end; ) {
    auto j = i;
    if (++j != end && j->first == i->first) {
      std::printf("Packages");
      j = i;
      do
        std::printf(" %s", j->second.c_str());
      while (++j != end && j->first == i->first);
      std::printf(" contain file %s\n", i->first.c_str());
    }
  //else
    //std::printf("Package %s contains file %s\n",
    //            i->second.c_str(), i->first.c_str());
    i = j;
  }
  return 0;
}

Reply via email to