On Friday 21 May 2010, Jim Gibson wrote:
> You are getting undefs because you have alternation (|) between two
> sub-patterns and capturing parentheses in each sub-pattern. You also have
> nested parentheses, with a capturing parenthese pair around the whole.
> 
> Your regular expression is this:
> 
>     /(little(\D*wonder)|high(.*like))/
> 
> with 3 sets of parentheses. Perl is returning what matched by each set of ()
> for each match. You have 3 sets of () and 3 matches. Therefore you are
> getting 9 returned values. Because you have alternation, only sub-pattern is
> matching, and the pair of () in the non-matched sub-pattern is returning
> undef.

That's strange. When one puts alternation with subpatterns, why would Perl 
return an unwanted undef? I'm asking Perl to match all of these (using 
alternation - |) and to return the strings that matches with the sub-patterns. 
I thought Perl would give just that, but I still don't have an idea why should 
it give me undefs? Shouldn't Perl be smart enough to understand that the outer 
most pair of paranthesis and | symbol were solely used for alternation (since 
there're nothing within the outermost paranthesis except another set of 
paranthesis and alternation) and it doesn't need to pick something out of it. 
If it can understand properly when I request to pick either abc or xyz - 
/(abc|xyz)/ - why wouldn't it understand when I do ask to pick ab of abc and yz 
of xyz - /((ab)c|x(yz))/ - and even with - /(?:(ab)c|x(yz))/. It could be said 
the term "not understanding" would be a bit over the top as it does print what 
was requested but the additional blanks should have been avoided. Someone 
please explain this to me. If it had to do with my regex pattern, I'm happy to 
change that. But if the answer is this is not possible with Perl regex, I would 
be disappointed as I think Perl regex should handle that.

> Can you explain what it is you are trying to do? You are probably better off
> not trying to do it in one go with a regular expression. Find your marker
> strings (maybe with index) extract your data (with substr) and do whatever
> processing you need.

What I'm trying to do is, I've got some strings to be picked up. I've got a 
pattern that would give 2 outputs (using paranthesis). First one, the matched 
string and the second, the critical part. If I take the earlier example, 
"little starHow I wonder" is the matched string and " starHow I wonder" is the 
critical part of it. 

> If you really want to use regular expression, consider using look-ahead
> tests and a while loop and don't expect to get everything you need in a
> single pass through your string.

I've earlier been using a for loop to get both the matched part and the 
critical part. However, I thought it would be better if I can avoid the loop as 
that would cause an unwanted delay if there are more strings to be matched, so 
I thought to get it done in a single statement, or a group of statements, 
without any loop. The above regex method worked for me, except it gave me some 
undefs as well. Is there any way I can get rid of them by using the same 
statements, may be even with adding a conditional statement inside the map.

-- 
Regards,
Akhthar Parvez K
http://tips.sysadminguide.com/
UNIX is basically a simple operating system, but you have to be a genius to 
understand the simplicity - Dennis Ritchie

Reply via email to