Chris Uppal schreef: > Since the interpretation of characters which are yet to be added to > Unicode is undefined (will they be digits, "letters", operators, > symbol, punctuation.... ?), there doesn't seem to be any sane way > that a language could allow an unrestricted choice of Unicode in > identifiers.
The Perl-code below prints: xdigit 22 /194522 = 0.011% (lower: 6, upper: 6) ascii 128 /194522 = 0.066% (lower: 26, upper: 26) \d 268 /194522 = 0.138% digit 268 /194522 = 0.138% IsNumber 612 /194522 = 0.315% alpha 91183 /194522 = 46.875% (lower: 1380, upper: 1160) alnum 91451 /194522 = 47.013% (lower: 1380, upper: 1160) word 91801 /194522 = 47.193% (lower: 1380, upper: 1160) graph 102330 /194522 = 52.606% (lower: 1380, upper: 1160) print 102349 /194522 = 52.616% (lower: 1380, upper: 1160) blank 18 /194522 = 0.009% space 24 /194522 = 0.012% punct 374 /194522 = 0.192% cntrl 6473 /194522 = 3.328% Especially look at 'word', the same as \w, which for ASCII is [0-9A-Za-z_]. ==8<=================== #!/usr/bin/perl # Program-Id: unicount.pl # Subject: show Unicode statistics use strict ; use warnings ; use Data::Alias ; binmode STDOUT, ':utf8' ; my @table = # +--Name------+---qRegexp--------+-C-+-L-+-U-+ ( [ 'xdigit' , qr/[[:xdigit:]]/ , 0 , 0 , 0 ] , [ 'ascii' , qr/[[:ascii:]]/ , 0 , 0 , 0 ] , [ '\\d' , qr/\d/ , 0 , 0 , 0 ] , [ 'digit' , qr/[[:digit:]]/ , 0 , 0 , 0 ] , [ 'IsNumber' , qr/\p{IsNumber}/ , 0 , 0 , 0 ] , [ 'alpha' , qr/[[:alpha:]]/ , 0 , 0 , 0 ] , [ 'alnum' , qr/[[:alnum:]]/ , 0 , 0 , 0 ] , [ 'word' , qr/[[:word:]]/ , 0 , 0 , 0 ] , [ 'graph' , qr/[[:graph:]]/ , 0 , 0 , 0 ] , [ 'print' , qr/[[:print:]]/ , 0 , 0 , 0 ] , [ 'blank' , qr/[[:blank:]]/ , 0 , 0 , 0 ] , [ 'space' , qr/[[:space:]]/ , 0 , 0 , 0 ] , [ 'punct' , qr/[[:punct:]]/ , 0 , 0 , 0 ] , [ 'cntrl' , qr/[[:cntrl:]]/ , 0 , 0 , 0 ] , ) ; my @codepoints = ( 0x0000 .. 0xD7FF, 0xE000 .. 0xFDCF, 0xFDF0 .. 0xFFFD, 0x10000 .. 0x1FFFD, 0x20000 .. 0x2FFFD, # 0x30000 .. 0x3FFFD, # etc. ) ; for my $row ( @table ) { alias my ($name, $qrx, $count, $lower, $upper) = @$row ; printf "\n%s\n", $name ; my $n = 0 ; for ( @codepoints ) { local $_ = chr ; # int-2-char conversion $n++ ; if ( /$qrx/ ) { $count++ ; $lower++ if / [[:lower:]] /x ; $upper++ if / [[:upper:]] /x ; } } my $show_lower_upper = ($lower || $upper) ? sprintf( " (lower:%6d, upper:%6d)" , $lower , $upper ) : '' ; printf "%6d /%6d =%7.3f%%%s\n" , $count , $n , 100 * $count / $n , $show_lower_upper } __END__ -- Affijn, Ruud "Gewoon is een tijger." -- http://mail.python.org/mailman/listinfo/python-list