With the following input:
> $ cat x
> "ⁿᵘˡˡ"
> "ܥܝܪܐܩ"
Running "uniq -c" says there's two copies of the same line!
> $ uniq -c x
> 2 "ⁿᵘˡˡ"
I've attached a copy of the test file, and here's the octal dump:
> $ od -b x
> 000 042 342 201 277 341 265 230 313 241 313 241 042 012 042 33
Yup, this does depend on the locale. In my original example, I had
LANG=en_US.UTF-8. Setting it to C.UTF-8 gets me the right result:
> $ LANG=C.UTF-8 uniq -c x
> 1 "ⁿᵘˡˡ"
> 1 "ܥܝܪܐܩ"
But, that doesn't fully explain what's going on. I find it difficult to
believe that there's any
, iraq);
printf("m = %d\n", m);
}
That correctly says the strings are different:
$ LANG=en_US.UTF-8 ./a.out
ⁿᵘˡˡ
ܥܝܪܐܩ
m = 6
> On Dec 16, 2019, at 7:46 PM, Roy Smith wrote:
>
> Yup, this does depend on the locale. In my original example, I had
> LANG=en_US.UTF-8