Re: std.regex with multiple matches

Kai Meyer Thu, 21 Apr 2011 13:30:18 -0700

On 04/21/2011 11:43 AM, David Gileadi wrote:

I was using std.regex yesterday, matching a regular expression against a
string with the "g" flag to find multiple matches. As the example from
the docs shows (BTW I think the example may be wrong; I think it needs
the "g" flag added to the regex call), you can do a foreach loop on the
matches like:


foreach(m; match("abcabcabab", regex("ab")))
{
writefln("%s[%s]%s", m.pre, m.hit, m.post);
}

Each match "m" is a RegexMatch, which includes .pre, .hit, and .post
properties to return ranges of everything before, inside, and after the
match.

However what I really wanted was a way to get the range between matches,
i.e. since I had multiple matches I wanted something like m.upToNextMatch.

Since I'm not very familiar with ranges, am I missing some obvious way
of doing this with the existing .pre, .hit and .post properties?

-Dave


There's two ways I can think of off the top of my head.

I don't think D supports "look ahead", but if it did you could matchsomething, then capture the portion afterwards (in m.captures[1]) thatmatches everything up until the look ahead (which is what you matched inthe first place).

Otherwise, you could manually capture the ranges like this (captures thefirst word character after each word boundry, then prints the remainingportion of the word until the next word boundary followed by a wordcharacter):


import std.stdio;
import std.regex;

void main()
{
    size_t last_pos;
    size_t last_size;
    string abc = "the quick brown fox jumped over the lazy dog";
    foreach(m; match(abc, regex(r"\b\w")))
    {
        writefln("between: '%s'", abc[last_pos + last_size..m.pre.length]);
        writefln("%s[%s]%s", m.pre, m.hit, m.post);
        last_size = m.hit.length;
        last_pos = m.pre.length;
    }
    writefln("between: '%s'", abc[last_pos + last_size..$]);
}
// Prints:
// between: ''
// [t]he quick brown fox jumped over the lazy dog
// between: 'he '
// the [q]uick brown fox jumped over the lazy dog
// between: 'uick '
// the quick [b]rown fox jumped over the lazy dog
// between: 'rown '
// the quick brown [f]ox jumped over the lazy dog
// between: 'ox '
// the quick brown fox [j]umped over the lazy dog
// between: 'umped '
// the quick brown fox jumped [o]ver the lazy dog
// between: 'ver '
// the quick brown fox jumped over [t]he lazy dog
// between: 'he '
// the quick brown fox jumped over the [l]azy dog
// between: 'azy '
// the quick brown fox jumped over the lazy [d]og
// between: 'og'

If you replace '\b\w' with '\s' it should help illuminate the way it works:

between: 'the'
the[ ]quick brown fox jumped over the lazy dog
between: 'quick'
the quick[ ]brown fox jumped over the lazy dog
between: 'brown'
the quick brown[ ]fox jumped over the lazy dog
between: 'fox'
the quick brown fox[ ]jumped over the lazy dog
between: 'jumped'
the quick brown fox jumped[ ]over the lazy dog
between: 'over'
the quick brown fox jumped over[ ]the lazy dog
between: 'the'
the quick brown fox jumped over the[ ]lazy dog
between: 'lazy'
the quick brown fox jumped over the lazy[ ]dog
between: 'dog'

Re: std.regex with multiple matches

Reply via email to