En Tue, 24 Jul 2007 00:23:46 -0300, Gordon Airporte <[EMAIL PROTECTED]> escribió:
> [EMAIL PROTECTED] wrote: >> if your search is not overly complicated, i think regexp is not >> needed. if you want, you can post a sample what you want to search, >> and some sample input. > > I'm afraid it's pretty complicated :-). I'm doing analysis of hand > histories that online poker sites leave for you. Here's one hand of a > play money ring game: > > > Full Tilt Poker Game #2042984473: Table Play Chip 344 - 10/20 - Limit > Hold'em - 18:07:20 ET - 2007/03/22 > Seat 1: grandmarambo (1,595) > Seat 4: justnoldfoolm (2,430) > justnoldfoolm posts the small blind of 5 > rickrn posts the big blind of 10 > The button is in seat #1 > *** HOLE CARDS *** > Dealt to moi [Jd 2c] > justnoldfoolm bets 10 > [more sample lines] > > So I'm picking out all kinds of info about my cards, my stack, my > betting, my position, board cards, other people's cards, etc. For > example, this pattern picks out which player bet and how much: > > betsRe = re.compile('^(.*) bets ([\d,]*)') > > I have 13 such patterns. The files I'm analyzing are just a session's > worth of histories like this, separated by \n\n\n. All of this > information needs to be organized by hand or by when it happened in a > hand, so I can't just run patterns over the whole file or I'll lose > context. > (Of course, in theory I could write a single monster expression that > would chop it all up properly and organize by context, but it would be > next to impossible to write/debug/maintain.) But you don't HAVE to use a regular expression. For so simple and predictable input, using partition or 'xxx in string' is around 4x faster: import re betsRe = re.compile('^(.*) bets ([\d,]*)') def test_partition(line): who, bets, amount = line.partition(" bets ") if bets: return who, amount def test_re(line): r = betsRe.match(line) if r: return r.group(1), r.group(2) line1 = "justnoldfoolm bets 10" assert test_re(line1) == test_partition(line1) == ("justnoldfoolm", "10") line2 = "Uncalled bet of 20 returned to justnoldfoolm" assert test_re(line2) == test_partition(line2) == None py> timeit.Timer("test_partition(line1)", "from __main__ import *").repeat() <timeit-src>:2: SyntaxWarning: import * only allowed at module level [1.1922188434563594, 1.2086988709458808, 1.1956522407177488] py> timeit.Timer("test_re(line1)", "from __main__ import *").repeat() <timeit-src>:2: SyntaxWarning: import * only allowed at module level [5.2871529761464018, 5.2763971398599523, 5.2791986132315714] As is often the case, a regular expression is NOT the right tool to use in this case. -- Gabriel Genellina -- http://mail.python.org/mailman/listinfo/python-list