On Fri, 11 Mar 2005 08:38:36 -0500, Charles Hartman wrote: > I'm still shaky on some of sre's syntax. Here's the task: I've got > strings (never longer than about a dozen characters) that are > guaranteed to be made only of characters 'x' and '/'. In each string I > want to find the longest continuous stretch of pairs whose first > character is 'x' and the second is either mark. So if marks = > '/xx/xxx///', the "winning" stretch begins at position 2 and is 6 > characters long ('x/xxx/')
I don't think regexes are a good match for this task. They just aren't optimized for this very well. Here's what I have, though I don't know if it's *exactly* what you want: def xCounter(s, initialOffset): offset = initialOffset count = 0 slen = len(s) while s[offset] == 'x': count = count + 1 offset += 2 if offset > slen: break return count, s[initialOffset:offset] def longestRun(s): return max([xCounter(s, i) for i in range(len(s))])[1] In other words, start at all the positions and find the max, sort of "by hand". (In particular, I don't know how this will handle two runs of equal size, if it will prefer the first or the last. I don't know how *you* handle that, either, so I'm not sweating it for now. If you clarify this I can help, the easiest way is to add another number into the tuple that xCounter generates for max to work on.) This is not optimized, and if you're processing millions of these, this may be too slow. What I'd suggest is that if this isn't fast enough, memoize the function, i.e., with this: longestRunResults = {} def memoizedLongestRun(s): try: return longestRunResults[s] except KeyError: pass result = longestRun(s) longestRunResults[s] = result return result You'll probably want to change the names of things to better fit. This will get you the benefits of pre-computation without paying the up-front costs (you only compute what you need, not all possible combinations), and this ought to be plenty fast to process a whole lotta data really quickly. (This is a great example of why readable code can be more important than fast code; the speedup can be added later, it's more important that you can satisfy yourself that the code is correct, and a "clever" algorithm might make that hard.) -- http://mail.python.org/mailman/listinfo/python-list