On 5/21/10 Fri  May 21, 2010  8:42 AM, "Akhthar Parvez K"
<akht...@sysadminguide.com> scribbled:

> On Friday 21 May 2010, Akhthar Parvez K wrote:

> I am stuck with regex again, this time I really need to *fix* it:
> 
> Code:
> 
> my @data = ( 'Twinkle twinkle little star
> How I wonder what you are
> Up above the world so high
> Like (a) diamond in the sky.
> 123
> Twinkle twinkle little star
> How I wonder what you are');
> my $rx1 = qr{ little(\D*wonder) }imx;
> my $rx2 = qr{ high(.*like) }imx;
> my @regex = ($rx1, $rx2);
> my $regx = join ("|", @regex);
> print "regx: $regx\n";
> my @matches = map { tr/\n//d; /($regx)/g } @data;
> print 'array: ', Dumper \...@matches;
> 
> Output:
> regx: (?ix-sm: little(\D*wonder) )|(?ix-sm: high(.*like) )
> array: $VAR1 = [
>           'little starHow I wonder',
>           ' starHow I wonder',
>           undef,
>           'highLike',
>           undef,
>           'Like',
>           'little starHow I wonder',
>           ' starHow I wonder',
>           undef
>         ];
> 
> I'm expecting a result like this:
> regx: (?ix-sm: little(\D*wonder) )|(?ix-sm: high(.*like) )
> array: $VAR1 = [
>           'little starHow I wonder',
>           ' starHow I wonder',
>           'highLike',
>           'Like',
>           'little starHow I wonder',
>           ' starHow I wonder',
>          ];
> 
> I would like to know why these undefs are appearing in between and how can I
> get rid of them. I am sure it's due to the way how I am concatenating the
> regex (with join) as it works fine if I put just one regex, but how can this
> be fixed when I'm using mutiple regex?

You are getting undefs because you have alternation (|) between two
sub-patterns and capturing parentheses in each sub-pattern. You also have
nested parentheses, with a capturing parenthese pair around the whole.

Your regular expression is this:

    /(little(\D*wonder)|high(.*like))/

with 3 sets of parentheses. Perl is returning what matched by each set of ()
for each match. You have 3 sets of () and 3 matches. Therefore you are
getting 9 returned values. Because you have alternation, only sub-pattern is
matching, and the pair of () in the non-matched sub-pattern is returning
undef.

Can you explain what it is you are trying to do? You are probably better off
not trying to do it in one go with a regular expression. Find your marker
strings (maybe with index) extract your data (with substr) and do whatever
processing you need.

If you really want to use regular expression, consider using look-ahead
tests and a while loop and don't expect to get everything you need in a
single pass through your string.




-- 
To unsubscribe, e-mail: beginners-unsubscr...@perl.org
For additional commands, e-mail: beginners-h...@perl.org
http://learn.perl.org/


Reply via email to