bug#49340: small sort takes hours for UTF-8 locale

2021-07-02 Thread Paul Eggert
On 7/2/21 4:19 PM, Pádraig Brady wrote: we might be able to improve things. For example, using strxfrm() + strcmp() to minimize processing. I tried that long ago, and it was waaayyy slower than strcoll in the typical case. glibc strxfrm is not at all optimized. Which is fine, since strxfrm i

bug#49340: small sort takes hours for UTF-8 locale

2021-07-02 Thread Pádraig Brady
On 02/07/2021 20:32, Jon Klaas wrote: Hello, I encountered a file that was taking hours to sort that was expected to take negligible time. This seems to be due to the locale LANG=en_US.UTF-8. I've worked around the problem by using LC_ALL=C, but thought I would report this, as I didn't see a r

bug#49340: small sort takes hours for UTF-8 locale

2021-07-02 Thread Jon Klaas
Hello, I encountered a file that was taking hours to sort that was expected to take negligible time. This seems to be due to the locale LANG=en_US.UTF-8. I've worked around the problem by using LC_ALL=C, but thought I would report this, as I didn't see a relevant bug report. This was seen on ce