Re: Problem with regex

Paul Johnson Mon, 06 Oct 2014 10:32:18 -0700

On Mon, Oct 06, 2014 at 09:34:15PM +0530, punit jain wrote:
> Hi,
> 
> I have a regex problem. My current working program is like this :-
> 
> 
> #!/usr/bin/perl
> 
> *$str='ldap:///uid=user1,ou=People,o=test.com
> <http://test.com>,mailto:us...@test.com
> <us...@test.com>,ldap:///uid=user2,ou=People,o=test.com
> <http://test.com>,ldap:///uid=user3,ou=People,o=test.com
> <http://test.com>';*
> 
> while($str=~/^(?:mailto:|ldap:\/\/\/)(.*?),(?:mailto:|ldap:\/\/\/)/){
> print "first = $1"."\n";
> 
>        # Process user for send email
>          ProcessUser($first);
>     if($str =~ /^(?:ldap:\/\/\/.+?|mailto:.+?),(ldap.*|mailto.*)/) {
> print "remain = $1"."\n";
> $str=$1;
> }
> }
> 
> However when I have input string as :-
> 
> 'ldap:///uid=user1,ou=People,o=test.com,a...@test.com*,t...@test.com
> <t...@test.com>,r...@test.com <r...@test.com>*,ldap:///uid=user2,ou=People,o=
> test.com,ldap:///uid=user3,ou=People,o=test.com'
> 
> it breaks. I tried multiple regex, however could not get it working.
> Any clue on regex changes to accomodate this ?


First, pay attention to what Kent has written.

But to get to specific problems:

I think you are misunderstanding how to use while with a regex match.
You don't need to replace the string with the unmatched part - perl will
handle that sort of low-level messing about for you.  You just need to
use the /g flag and the next match will start off where the previous one
finished.

To get this to work, don't anchor your regex to the start of the string,
or you'll only get one match.

Also, your check that you are only matching up to the next mailto or
ldap section needs to ensure that it doesn't consume that part of the
string.  This is done by using a positive lookahead assertion (?=).

Then you need to allow for matching the last part of the string, which
will not be followed by another part to match.

Finally, it looks like you have newline characters in your input.  If
that is the case then you need to add the /s flag so that . will also
match a newline.

And for style, you can pull out the duplicated parts of the regex and
use another delimiter to avoid Leaning Toothpick Syndrome.

Putting it together you get:

    my $start = qr!mailto:|ldap:///!;
    while ($str =~ /$start(.*?)(?=,$start|$)/sg) {
        print "first = $1\n";
    }

Or you could avoid the messing about with the while condition and use
split:

    say for split $start, $str;

-- 
Paul Johnson - p...@pjcj.net
http://www.pjcj.net

-- 
To unsubscribe, e-mail: beginners-unsubscr...@perl.org
For additional commands, e-mail: beginners-h...@perl.org
http://learn.perl.org/

Re: Problem with regex

Reply via email to