In trying to use `join` with `sort` I discovered odd behavior: even after running a file through `sort` using the same delimiter, `join` would still complain that it was out of order.
The field I am sorting on is ip addresses, which means that depending on which digits are zero they can be of different lengths, and the fields include periods as well as alpha-numeric characters. Here is a way to reproduce the problem: > printf '1.1.1,2\n1.1.12,2\n1.1.2,1' | sort -t, > a.txt > printf '1.1.12,a\n1.1.1,b\n1.1.21,c' | sort -t, > b.txt > join -t, a.txt b.txt join: b.txt:2: is not sorted: 1.1.1,b The expected behavior would be that if a file has been sorted by "sort" it will also be considered sorted by join. --- I traced this back to what I believe to be a bug in sort.c when sorting on a field other than the last field, where the original pointer is being incremented one further than it ought to be. On line 1675 it will always increment the pointer one position beyond the delimiter unless the field is the last field. If both `eword` and `echar` are 0 we incremented `eword` on line 1661. Later when we use keylim (where the limfield value is stored) to calculate the length of the field, it will include the delimiter in the comparison. We can illustrate that the problem is including the delimiter because the following case runs correctly without error: > printf '1.1.1Z2\n1.1.12Z2\n1.1.2Z1' | sort -tZ > a.txt > printf '1.1.12Za\n1.1.1Zb\n1.1.21Zc' | sort -tZ > b.txt > join -tZ a.txt b.txt In join.c, in comparison, we are comparing the contents of the keys without the delimiter (on join.c:283 we call extract_field with `ptr` pointing to the start of the key and len defined as `sep - ptr`, where `sep` is the position of the tab character). Cases illustrating the bug in sort: > printf '12,\n1,\n' | sort -t, -k1 1, 12, > printf '12,a\n1,a\n' | sort -t, -k1 12,a 1,a Thank you, Beth Andres-Beck