Charles Hartman <[EMAIL PROTECTED]> wrote:I'm still shaky on some of sre's syntax. Here's the task: I've got strings (never longer than about a dozen characters) that are guaranteed to be made only of characters 'x' and '/'.
One possibility is to cheat completely, and depending on memory constraints
and how often you need to do this, it might even make sense to do so.
Interesting -- I hadn't thought of that. It would, realistically, probably be 2 ** 14 or so, but what the hell. Trouble is, this would require me to pick out by hand/eye -- once only, admittedly -- something like 16,384 "longests," which I do not care to do. If I can compute them, instead, there's no reason not to do it on the fly; it doesn 't take *that* long.
There's only two possible values for each character, so you're really
dealing with binary numbers. The max length is 12 characters (I'm
deliberately being oblivious to where you said "about" :-)), so there are
only 2^12 or 4096 possible strings. You could probably afford to
pre-compute all 4096 strings, use them as dictionary keys, and store a
(start, end) tuple for each key. Computation then becomes a simple
dictionary lookup.
In each string I want to find the longest continuous stretch of pairs whose first character is 'x' and the second is either mark. So if marks = '/xx/xxx///', the "winning" stretch begins at position 2 and is 6 characters long ('x/xxx/'), which requires finding a second match that overlaps the first match (which is just 'xx' in position 1). (When there are multiple "winning" stretches, I might want to adjudicate among them, but that's a separate problem.) I hope this is clear enough.
Unfortunately, no, it's not very clear. In fact, I have no clue what you're trying to do. Could you try explaining it again?
See below.
Charles Hartman Professor of English, Poet in Residence
Hmmm. Are you, by any chance, looking for meter patterns in verse?
Yes, of course. My program that scans iambic pentameters (Scandroid, ver. 0.2a) is available (GPL) on the "Programs" page of my site listed below. It pretty much works, which is interesting. (People who know how to do this generally don't believe a program can do it.) I'm a week or two (knock on wood) from version 1.0, which handles anapestic meters as well, and non-pentameter lengths, and do other stuff.
I'm not sure which part of my explanation wasn't clear. The string ('/' and 'x' only) will ultimately be divided into groups of one, two, three, or four marks, by many complex operations. One approach to the whole task (an approach I just discovered a month or two ago, which supplements a more chunk-by-chunk method) is to look for the longest substring within that string of (perhaps) 10 or twelve mrks, of which the following is true: if the substring is divided into pairs, each pair consists of a 'x' followed by either a 'x' or a '/'. The general problem is to the find the longest possible substring matching an sre specfication -- and a general solution to that doesn't seem to be obvious, though the need for it must arise often enough.
What I want is code (a line, a function . . .) that returns the beginning-point and length of that longest substring. (I could ask for the substring itself; it doesn't matter; the information sought will be some subset of the information supplied with every sre match object.) For my current "best" solution, see previous message. Hope this is any clearer.
Charles Hartman Professor of English, Poet in Residence http://cherry.conncoll.edu/cohar http://villex.blogspot.com
-- http://mail.python.org/mailman/listinfo/python-list