Well, sure, if it's just speed, conciseness and backwards-compatibility that you want ;-)On Fri, 17 Dec 2004 02:06:01 GMT, Steven Bethard <[EMAIL PROTECTED]> wrote:
Michael Spencer wrote:
... conv = "".join(char.lower() for char in text if char not in unwanted)
Probably a good place to use str.replace, e.g.
conv = text.lower() for char in unwanted: conv = conv.replace(char, '')
Some timings to support my assertion: =)
C:\Documents and Settings\Steve>python -m timeit -s "s = ''.join(map(str, range(100)))" "s = ''.join(c for c in s if c not in '01')"
10000 loops, best of 3: 74.6 usec per loop
C:\Documents and Settings\Steve>python -m timeit -s "s = ''.join(map(str, range(100)))" "for c in '01': s = s.replace(c, '')"
100000 loops, best of 3: 2.82 usec per loop
Good point - and there is string.maketrans to set up the table too. So normalize can be rewritten as:
If unwanted has more than one character in it, I would expect unwanted as deletechars in
>>> help(str.translate) Help on method_descriptor:
translate(...) S.translate(table [,deletechars]) -> string
Return a copy of the string S, where all characters occurring in the optional argument deletechars are removed, and the remaining characters have been mapped through the given translation table, which must be a string of length 256.
to compete well, if table setup were for free (otherwise, UIAM, table should be ''.join([chr(i) for i in xrange(256)]) for identity translation, and that might pay for a couple of .replace loops, depending).
Regards, Bengt Richter
def normalize1(text, unwanted = "()", table = maketrans("","")): text = text.lower() text.translate(table,unwanted) return set(text.split())
which gives:
>>> t= timeit.Timer("normalize1('(UPPER CASE) lower case')", "from listmembers import normalize1")
>>> t.repeat(3,10000)
[0.29812783468287307, 0.29807782832722296, 0.3021370034462052]
But, while we're at it, we can use str.translate to do the case conversion too:
So:
def normalize2(text, unwanted = "()", table = maketrans(ascii_uppercase,ascii_lowercase)):
text.translate(table,unwanted)
return set(text.split())
>>> t= timeit.Timer("normalize2('(UPPER CASE) lower case')", "from listmembers import normalize2")
>>> t.repeat(3,10000)
[0.24295154831133914, 0.24174497038029585, 0.25234855267899547]
...which is a little faster still
Thanks for the comments: they were interesting for me - hope some of this is useful to OP
Regards
Michael
-- http://mail.python.org/mailman/listinfo/python-list