On 8/18/2014 3:16 PM, Alex Willmer wrote:
A challenge, just for fun. Can you speed up this function?
You should give a specification here, with examples. You should perhaps be using .maketrans and .translate.
import string charset = set(string.ascii_letters + string.digits + '@_-') byteseq = [chr(i) for i in xrange(256)] bytemap = {byte: byte if byte in charset else '+' + byte.encode('hex') for byte in byteseq} def plus_encode(s): """Encode a unicode string with only ascii letters, digits, _, -, @, + """ bytemap_ = bytemap s_utf8 = s.encode('utf-8') return ''.join([bytemap[byte] for byte in s_utf8]) On my machine (Ubuntu 14.04, CPython 2.7.6, PyPy 2.2.1) this gets alex@martha:~$ python -m timeit -s 'import plus_encode' 'plus_encode.plus_encode(u"""qwertyuiop1234567890!"£$%^&*()EURO""")' 100000 loops, best of 3: 2.96 usec per loop alex@martha:~$ pypy -m timeit -s 'import plus_encode' 'plus_encode.plus_encode(u"""qwertyuiop1234567890!"£$%^&*()EURO""")' 1000000 loops, best of 3: 1.24 usec per loop Back story: Last week we needed a custom encoding to store unicode usernames in a config file that only allowed mixed case ascii, digits, underscore, dash, at-sign and plus sign. We also wanted to keeping the encoded usernames somewhat human readable. My design was utf-8 and a variant of %-escaping, using the plus symbol. So u'alic EURO 123' would be encoded as b'alic+e2+82+ac123'. This evening as a learning exercise I've tried to make it fast. This is the result. This challenge is just for fun. The chosen solution ended up being def name_encode(s): return %s_%s' % (s.encode('utf-8').encode('hex'), re.replace('[A-Za-z0-9]', '', s)) Regards, Alex
-- Terry Jan Reedy -- https://mail.python.org/mailman/listinfo/python-list