On Tue, Jan 22, 2013 at 09:05:11AM +0100, mauro tonon wrote: > 2013/1/22 Peter A. Shevtsov <petr.shevt...@gmail.com>: > > On 22/01/13 at 02:32pm, Peter A. Shevtsov wrote: > > > >> It seems that it counts every cyrillic letter as two, i. e. it ain't count > >> letters > >> (or runes) but bytes. > > > > Indeed, > > > > echo latin ?????????????????? | /usr/local/plan9/bin/awk '{printf("%d > > %d\n", length($1), > > length($2))}' > > > > 5 18 > > > > Also, awk can't know beforehand if the input string is UTF-8 encoded > or not, so the only thing it can do is to count bytes....
Don't we have environment vars for that? or do they suck? In plan 9, everything is utf-8, no? anyway, I say stick with counting bytes, for better performance!