Em Ter, 2006-04-18 às 17:25 -0700, [EMAIL PROTECTED] escreveu: > Hi, > I have a bunch of strings like > a53bc_531.txt > a53bc_2285.txt > ... > a53bc_359.txt > > and I want to extract the numbers 531, 2285, ...,359.
Some ways: 1) Regular expressions, as you said: >>> from re import compile >>> find = compile("a53bc_([1-9]*)\\.txt").findall >>> find('a53bc_531.txt\na53bc_2285.txt\na53bc_359.txt') ['531', '2285', '359'] 2) Using ''.split: >>> [x.split('.')[0].split('_')[1] for x in 'a53bc_531.txt \na53bc_2285.txt\na53bc_359.txt'.splitlines()] ['531', '2285', '359'] 3) Using indexes (be careful!): >>> [x[6:-4] for x in 'a53bc_531.txt\na53bc_2285.txt \na53bc_359.txt'.splitlines()] ['531', '2285', '359'] Measuring speeds: $ python2.4 -m timeit -s 'from re import compile; find = compile("a53bc_([1-9]*)\\.txt").findall; s = "a53bc_531.txt \na53bc_2285.txt\na53bc_359.txt"' 'find(s)' 100000 loops, best of 3: 3.03 usec per loop $ python2.4 -m timeit -s 's = "a53bc_531.txt\na53bc_2285.txt \na53bc_359.txt\n"[:-1]' "[x.split('.')[0].split('_')[1] for x in s.splitlines()]" 100000 loops, best of 3: 7.64 usec per loop $ python2.4 -m timeit -s 's = "a53bc_531.txt\na53bc_2285.txt \na53bc_359.txt\n"[:-1]' "[x[6:-4] for x in s.splitlines()]" 100000 loops, best of 3: 2.47 usec per loop $ python2.4 -m timeit -s 'from re import compile; find = compile("a53bc_([1-9]*)\\.txt").findall; s = ("a53bc_531.txt \na53bc_2285.txt\na53bc_359.txt\n"*1000)[:-1]' 'find(s)' 1000 loops, best of 3: 1.95 msec per loop $ python2.4 -m timeit -s 's = ("a53bc_531.txt\na53bc_2285.txt \na53bc_359.txt\n" * 1000)[:-1]' "[x.split('.')[0].split('_')[1] for x in s.splitlines()]" 100 loops, best of 3: 6.51 msec per loop $ python2.4 -m timeit -s 's = ("a53bc_531.txt\na53bc_2285.txt \na53bc_359.txt\n" * 1000)[:-1]' "[x[6:-4] for x in s.splitlines()]" 1000 loops, best of 3: 1.53 msec per loop Summary: using indexes is less powerful than regexps, but faster. HTH, -- Felipe. -- http://mail.python.org/mailman/listinfo/python-list