On Aug 13, 5:18 pm, kettle <[EMAIL PROTECTED]> wrote: > Hi, > I was wondering how I ought to be handling character range > translations in python. > > What I want to do is translate fullwidth numbers and roman alphabet > characters into their halfwidth ascii equivalents. > In perl I can do this pretty easily with tr: > > tr/\x{ff00}-\x{ff5e}/\x{0020}-\x{007e}/; > > and I think the string.translate method is what I need to use to > achieve the equivalent in python. Unfortunately the maktrans method > doesn't seem to accept character ranges and I'm also having trouble > with it's interpretation of length. What I came up with was to first > fudge the ranges: > > my_test_string = u"ABCDEFG" > f_range = "".join([unichr(x) for x in > range(ord(u"\uff00"),ord(u"\uff5e"))]) > t_range = "".join([unichr(x) for x in > range(ord(u"\u0020"),ord(u"\u007e"))]) > > then use these as input to maketrans: > my_trans_string = > my_test_string.translate(string.maketrans(f_range,t_range)) > Traceback (most recent call last): > File "<stdin>", line 1, in ? > UnicodeEncodeError: 'ascii' codec can't encode characters in position > 0-93: ordinal not in range(128) > > but it generates an encoding error... and if I encodethe ranges in > utf8 before passing them on I get a length error because maketrans is > counting bytes not characters and utf8 is variable width... > my_trans_string = > my_test_string.translate(string.maketrans(f_range.encode("utf8"),t_range.encode("utf8"))) > Traceback (most recent call last): > File "<stdin>", line 1, in ? > ValueError: maketrans arguments must have same length
Ok so I guess I was barking up the wrong tree. Searching for python 全角 半角 quickly brought up a solution: >>>import unicodedata >>>my_test_string=u"[EMAIL PROTECTED]" >>>print unicodedata.normalize('NFKC', my_test_string.decode("utf8")) [EMAIL PROTECTED]@123 >>> still, it would be nice if there was a more general solution, or if maketrans actually looked at chars instead of bytes methinks. -- http://mail.python.org/mailman/listinfo/python-list