kettle wrote: > I was wondering how I ought to be handling character range > translations in python. > > What I want to do is translate fullwidth numbers and roman alphabet > characters into their halfwidth ascii equivalents. > In perl I can do this pretty easily with tr: > > tr/\x{ff00}-\x{ff5e}/\x{0020}-\x{007e}/; > > and I think the string.translate method is what I need to use to > achieve the equivalent in python. Unfortunately the maktrans method > doesn't seem to accept character ranges and I'm also having trouble > with it's interpretation of length. What I came up with was to first > fudge the ranges: > > my_test_string = u"ABCDEFG" > f_range = "".join([unichr(x) for x in > range(ord(u"\uff00"),ord(u"\uff5e"))]) > t_range = "".join([unichr(x) for x in > range(ord(u"\u0020"),ord(u"\u007e"))]) > > then use these as input to maketrans: > my_trans_string = > my_test_string.translate(string.maketrans(f_range,t_range)) > Traceback (most recent call last): > File "<stdin>", line 1, in ? > UnicodeEncodeError: 'ascii' codec can't encode characters in position > 0-93: ordinal not in range(128)
maketrans only works for byte strings. as for translate itself, it has different signatures for byte strings and unicode strings; in the former case, it takes lookup table represented as a 256-byte string (e.g. created by maketrans), in the latter case, it takes a dictionary mapping from ordinals to ordinals or unicode strings. something like lut = dict((0xff00 + ch, 0x0020 + ch) for ch in range(0x80)) new_string = old_string.translate(lut) could work (untested). </F> -- http://mail.python.org/mailman/listinfo/python-list