bug#30814: Please increase the value of MAX_MON_WIDTH in ls.c

Rafal Luzynski Wed, 14 Mar 2018 15:55:35 -0700

14.03.2018 19:40 Pádraig Brady <[email protected]> wrote:
> [...]
> One can browse the abbreviations by length using:
>
> locale -a | grep utf8 |
> while read l; do LC_ALL=$l locale abmon; done |
> tr ';' '\n' | sort -u | grep '.\{5,\}' |
> while read mon; do
> printf '%02d %s\n' "$(echo "$mon" | wc -L)" "$mon"
> done |
> sort -n | less
>
> That shows a couple of existing issues with the limit of 5.
> ln_CD.utf8 (Democratic Republic of the Congo) needs a length of 7 to be
> unambiguous,
> while Arabic needs 12!
> [...]
>
> $ LC_ALL=ln_CD.utf8 locale abmon
> sánzá1.;sánzá2.;sánzá3.;sánzá4.;sánzá5.;sánzá6.;sánzá7.;sánzá8.;sánzá9.;sánz10.;sánzá11.;sánzá12.


Nice, script, thank you. :-) The issue with ln_CD is no longer
true, it has been fixed in June/July 2017. Please see the output
on Fedora 28 (beta) with glibc 2.27:

$ LC_ALL=ln_CD.utf8 locale abmon
yan;fbl;msi;apl;mai;yun;yul;agt;stb;ɔtb;nvb;dsb

but it does not help because some Arabic languages still need 12.
Even worse, your script ran at the same machine gives the following
output (only the final lines):

...
11 siakwa kati
11 yahbra kati
11 تشرين الأول
11 كانون الأول
12 kakamuk kati
12 pastara kati
12 waupasa kati
12 تشرين الثاني
12 كانون الثاني
15 lî wainhka kati
15 lih mairin kati
(END)

Those with 15 characters come from miq_NI language which has been
introduced in September 2017 (glibc 2.27, released Feb 1, 2018):

$ LC_ALL=miq_NI.utf8 locale abmon
siakwa kati;kuswa kati;kakamuk kati;lî wainhka kati;lih mairin kati;lî
kati;pastara kati;sikla kati;wîs kati;waupasa kati;yahbra kati;trisu kati
$ LC_ALL=miq_NI.utf8 locale mon
siakwa kati;kuswa kati;kakamuk kati;lî wainhka kati;lih mairin kati;lî
kati;pastara kati;sikla kati;wîs kati;waupasa kati;yahbra kati;trisu kati

But, as you can see, this locale data should be fixed because abmon
and mon are the same; at least " kati" which appears everywhere may
be probably removed. Also truncating the string to 12 characters
probably still makes it unambiguous.

While at this, I have not checked but does your tests/ls/abmon-align.sh
script check for the length required to make all abbreviated month
names unambiguous (i.e., how many letters can we truncate to ensure
that the month names are still unambiguous) or just the longest
abbreviated month name?

> $ LC_ALL=ar_SY.utf8 locale abmon | tr ';' '\n'
> [...]

This is still true although again, mon and abmon seem to be the same
in ar_SY which is probably not the best we can have. I wish I could
fix it if I only knew how. :) (BTW, other Arabic variants seem to have
the abbreviated month names shorter.)

> [...]
> Given the increase in supported size should only impact relatively few
> languages
> it probably makes sense to increase to 12. The attached does that
> and also augments the test to find ambiguous cases.

12 is more than I asked for but that's definitely not destructive.
My only remark is: please remove "Lingala" from the commit comment
because it is no longer true. Otherwise the patch seems to be OK.

Thank you and best regards,

Rafal

bug#30814: Please increase the value of MAX_MON_WIDTH in ls.c

Reply via email to