Hi, I found some unexpected results with sort -V. I hope this is the correct place to send a bug report to [1]. They are caused by a bug in filevercmp inside gnulib, specifically in the function match_suffix. I assume it should, as documented, match a file ending as defined by this regex: /(\.[A-Za-z~][A-Za-z0-9~]*)*$/ However, I found two cases where this does not happen: 1) Two consecutive dots. It is not checked if the character after a dot is a dot. This results in nothing being matched in a case like "a..a", even though it should match ".a" according to the regex. Testcase: printf "a..a\na.+" | sort -V # a..a should be before a.+ I think 2) A trailing dot. If there is no additional character after a dot, it is still matched (e.g. for "a." the . is matched). Testcase: printf "a.\na+" | sort -V # I think a+ should be before a.
Additionally I noticed that filevercmp ignores all characters after a NULL byte. This can be seen here: printf "a\0a\na" | sort -Vs sort seems to otherwise consider null bytes (that's why the --stable flag is necessary in the above example). Is this the expected behavior? Finally I wanted to ask if it is the expected behavior for filevercmp to do a strcmp if it can't find another difference, at least from the perspective of sort. This means that the --stable flag for sort has no effect in combination with --version-sort (well, except if the input contains NULL bytes, as mentioned above :) I'll attach a rather simple patch to fix 1) and 2) (including test), I hope that's right. Have a nice day, Michael [1]: https://www.gnu.org/software/coreutils/manual/html_node/Reporting-bugs-or-incorrect-results.html#Reporting-bugs-or-incorrect-results
diff --git a/lib/filevercmp.c b/lib/filevercmp.c index fca23ec4f..fdb4184d4 100644 --- a/lib/filevercmp.c +++ b/lib/filevercmp.c @@ -37,22 +37,24 @@ match_suffix (const char **str) bool read_alpha = false; while (**str) { - if (read_alpha) + if ('.' == **str) + { + if (!match || read_alpha) + match = *str; + read_alpha = true; + } + else if (read_alpha) { read_alpha = false; if (!c_isalpha (**str) && '~' != **str) match = NULL; } - else if ('.' == **str) - { - read_alpha = true; - if (!match) - match = *str; - } else if (!c_isalnum (**str) && '~' != **str) match = NULL; (*str)++; } + if (read_alpha) + return NULL; return match; } diff --git a/tests/test-filevercmp.c b/tests/test-filevercmp.c index d6c65fad9..a8496ec99 100644 --- a/tests/test-filevercmp.c +++ b/tests/test-filevercmp.c @@ -51,6 +51,10 @@ static const char *const examples[] = "a.b", "a.bc~", "a.bc", + "a+", + "a.", + "a..a", + "a.+", "b~", "b", "gcc-c++-10.fc9.tar.gz",