Hi, I was wondering how I ought to be handling character range translations in python.
What I want to do is translate fullwidth numbers and roman alphabet characters into their halfwidth ascii equivalents. In perl I can do this pretty easily with tr: tr/\x{ff00}-\x{ff5e}/\x{0020}-\x{007e}/; and I think the string.translate method is what I need to use to achieve the equivalent in python. Unfortunately the maktrans method doesn't seem to accept character ranges and I'm also having trouble with it's interpretation of length. What I came up with was to first fudge the ranges: my_test_string = u"ABCDEFG" f_range = "".join([unichr(x) for x in range(ord(u"\uff00"),ord(u"\uff5e"))]) t_range = "".join([unichr(x) for x in range(ord(u"\u0020"),ord(u"\u007e"))]) then use these as input to maketrans: my_trans_string = my_test_string.translate(string.maketrans(f_range,t_range)) Traceback (most recent call last): File "<stdin>", line 1, in ? UnicodeEncodeError: 'ascii' codec can't encode characters in position 0-93: ordinal not in range(128) but it generates an encoding error... and if I encodethe ranges in utf8 before passing them on I get a length error because maketrans is counting bytes not characters and utf8 is variable width... my_trans_string = my_test_string.translate(string.maketrans(f_range.encode("utf8"),t_range.encode("utf8"))) Traceback (most recent call last): File "<stdin>", line 1, in ? ValueError: maketrans arguments must have same length -- http://mail.python.org/mailman/listinfo/python-list