Hi,
I found some unexpected results with sort -V. I hope this is the correct
place to send a bug report to [1].
They are caused by a bug in filevercmp inside gnulib, specifically in the
function match_suffix.
I assume it should, as documented, match a file ending as defined by this
regex: /(\.[A-Za-z~][A-Za-z0-9~]*)*$/
However, I found two cases where this does not happen:
1) Two consecutive dots. It is not checked if the character after a dot is
a dot. This results in nothing being matched in a case like "a..a", even
though it should match ".a" according to the regex.
Testcase: printf "a..a\na.+" | sort -V # a..a should be before a.+ I think
2) A trailing dot. If there is no additional character after a dot, it is
still matched (e.g. for "a." the . is matched).
Testcase: printf "a.\na+" | sort -V # I think a+ should be before a.

Additionally I noticed that filevercmp ignores all characters after a NULL
byte.
This can be seen here: printf "a\0a\na" | sort -Vs
sort seems to otherwise consider null bytes (that's why the --stable flag
is necessary in the above example). Is this the expected behavior?

Finally I wanted to ask if it is the expected behavior for filevercmp to do
a strcmp if it can't find another difference, at least from the perspective
of sort.
This means that the --stable flag for sort has no effect in combination
with --version-sort (well, except if the input contains NULL bytes, as
mentioned above :)

I'll attach a rather simple patch to fix 1) and 2) (including test), I hope
that's right.

Have a nice day,
Michael

[1]:
https://www.gnu.org/software/coreutils/manual/html_node/Reporting-bugs-or-incorrect-results.html#Reporting-bugs-or-incorrect-results
diff --git a/lib/filevercmp.c b/lib/filevercmp.c
index fca23ec4f..fdb4184d4 100644
--- a/lib/filevercmp.c
+++ b/lib/filevercmp.c
@@ -37,22 +37,24 @@ match_suffix (const char **str)
   bool read_alpha = false;
   while (**str)
     {
-      if (read_alpha)
+      if ('.' == **str)
+        {
+          if (!match || read_alpha)
+            match = *str;
+          read_alpha = true;
+        }
+      else if (read_alpha)
         {
           read_alpha = false;
           if (!c_isalpha (**str) && '~' != **str)
             match = NULL;
         }
-      else if ('.' == **str)
-        {
-          read_alpha = true;
-          if (!match)
-            match = *str;
-        }
       else if (!c_isalnum (**str) && '~' != **str)
         match = NULL;
       (*str)++;
     }
+  if (read_alpha)
+    return NULL;
   return match;
 }
 
diff --git a/tests/test-filevercmp.c b/tests/test-filevercmp.c
index d6c65fad9..a8496ec99 100644
--- a/tests/test-filevercmp.c
+++ b/tests/test-filevercmp.c
@@ -51,6 +51,10 @@ static const char *const examples[] =
   "a.b",
   "a.bc~",
   "a.bc",
+  "a+",
+  "a.",
+  "a..a",
+  "a.+",
   "b~",
   "b",
   "gcc-c++-10.fc9.tar.gz",

Reply via email to