how do i get a unicode's number? e.g. 03ba for greek lowercase kappa? (or in decimal form)
Xah Xah Lee wrote: > python has this nice unicodedata module that deals with unicode nicely. > > #-*- coding: utf-8 -*- > # python > > from unicodedata import * > > # each unicode char has a unique name. > # one can use the âlookupâ func to find it > > mychar=lookup('greek cApital letter sIgma') > # note letter case doesn't matter > print mychar.encode('utf-8') > > m=lookup('CJK UNIFIED IDEOGRAPH-5929') > # for some reason, case must be right here. > print m.encode('utf-8') > > # to find a char's name, use the ânameâ function > print name(u'å') > > basically, in unicode, each char has a number of attributes (called > properties) besides its name. These attributes provides necessary info > to form letters, words, or processing such as sorting, capitalization, > etc, of varous human scripts. For example, Latin alphabets has two > forms of upper case and lower case. Korean alphabets are stacked > together. While many symbols corresponds to numbers, and there are also > > combining forms used for example to put a bar over any letter or > character. Also some writings systems are directional. In order to form > > these symbols for display or process them for computing, info of these > on each char is necessary. > > the rest of functions in unicodedata return these attributes. > > see unicodedata doc: > http://python.org/doc/2.4/lib/module-unicodedata.html > > Official word on unicode character properties: > http://www.unicode.org/uni2book/ch04.pdf > > -- > i don't know what's the state of Perl's unicode. Is there something > similar? > > -- > this post is archived at > http://xahlee.org/perl-python/unicodedata_module.html > > Xah > [EMAIL PROTECTED] > http://xahlee.org/PageTwo_dir/more.html -- http://mail.python.org/mailman/listinfo/python-list