On Tue, Jan 22, 2013 at 09:05:11AM +0100, mauro tonon wrote:
> 2013/1/22 Peter A. Shevtsov <petr.shevt...@gmail.com>:
> > On 22/01/13 at 02:32pm, Peter A. Shevtsov wrote:
> >
> >> It seems that it counts every cyrillic letter as two, i. e. it ain't count 
> >> letters
> >> (or runes) but bytes.
> >
> > Indeed,
> >
> > echo latin ?????????????????? | /usr/local/plan9/bin/awk '{printf("%d 
> > %d\n", length($1),
> > length($2))}'
> >
> > 5 18
> >
> 
> Also, awk can't know beforehand if the input string is UTF-8 encoded
> or not, so the only thing it can do is to count bytes....

Don't we have environment vars for that?  or do they suck?

In plan 9, everything is utf-8, no?

anyway, I say stick with counting bytes, for better performance!

Reply via email to