Bengt Richter wrote:
On Fri, 17 Dec 2004 02:06:01 GMT, Steven Bethard <[EMAIL PROTECTED]> wrote:


Michael Spencer wrote:

... conv = "".join(char.lower() for char in text if char not in unwanted)

Probably a good place to use str.replace, e.g.

conv = text.lower()
for char in unwanted:
   conv = conv.replace(char, '')

Some timings to support my assertion: =)

C:\Documents and Settings\Steve>python -m timeit -s "s = ''.join(map(str, range(100)))" "s = ''.join(c for c in s if c not in '01')"
10000 loops, best of 3: 74.6 usec per loop


C:\Documents and Settings\Steve>python -m timeit -s "s = ''.join(map(str, range(100)))" "for c in '01': s = s.replace(c, '')"
100000 loops, best of 3: 2.82 usec per loop


Well, sure, if it's just speed, conciseness and backwards-compatibility that you want ;-)


If unwanted has more than one character in it, I would expect unwanted as deletechars in

 >>> help(str.translate)
 Help on method_descriptor:

 translate(...)
     S.translate(table [,deletechars]) -> string

     Return a copy of the string S, where all characters occurring
     in the optional argument deletechars are removed, and the
     remaining characters have been mapped through the given
     translation table, which must be a string of length 256.

to compete well, if table setup were for free
(otherwise, UIAM, table should be ''.join([chr(i) for i in xrange(256)])
for identity translation, and that might pay for a couple of .replace loops,
depending).

Regards,
Bengt Richter
Good point - and there is string.maketrans to set up the table too. So normalize can be rewritten as:


def normalize1(text, unwanted = "()", table = maketrans("","")): text = text.lower() text.translate(table,unwanted) return set(text.split())

which gives:
>>> t= timeit.Timer("normalize1('(UPPER CASE) lower case')", "from listmembers import normalize1")
>>> t.repeat(3,10000)
[0.29812783468287307, 0.29807782832722296, 0.3021370034462052]



But, while we're at it, we can use str.translate to do the case conversion too:

So:

def normalize2(text, unwanted = "()", table = maketrans(ascii_uppercase,ascii_lowercase)):
text.translate(table,unwanted)
return set(text.split())


>>> t= timeit.Timer("normalize2('(UPPER CASE) lower case')", "from listmembers import normalize2")
>>> t.repeat(3,10000)
[0.24295154831133914, 0.24174497038029585, 0.25234855267899547]



...which is a little faster still

Thanks for the comments: they were interesting for me - hope some of this is useful to OP

Regards

Michael








-- http://mail.python.org/mailman/listinfo/python-list

Reply via email to