On 04/05/2013 14:26, Florian Huber wrote:
I'm parsing a logfile and don't quite understand the behaviour of m//.
From a previous regex match I have already captured $+{'GFP'}:
use strict;
use warnings;
(...)
$text =~ m/ (?<GFP>FILTERS .*? WRT)/x; # I simply have my whole
logfile in $text - I know there are better solutions.
print $+{'GFP'}, "\n";
prints this:
FILTERS GFP,GFP,100% ACTSHUT 17 AVG 4,1.000000 WRT
Now I want to go on parsing $+{'GFP'}. To be more precise, I want to
capture "AVG 4":
If I do:
$+{'GFP'} =~ m/(?<AVG>AVG\s\d);
print "\$+{'GFP'} is $+{'GFP'}.\n";
I get a warning that I'm using an uninitialised value and:
$+{'GFP'} is .
$+{'AVG'} is AVG 4.
So first question: Apparently some return value of $+{'GFP'} =~
m/PATTERN/; messes with this hash value. So from what I can remember,
the return value will indicate if the substitution was successful or
not. Then why don't I get some value like 0 or 1 in $+{'GFP'} but just
an uninitialised value?
If the match is not successful, $+{'GFP'} will stay untouched:
$+{'GFP'} =~ m/(?<AVG>AVG\s\d nothere);
print "\$+{'GFP'} is $+{'GFP'}.\n";
print "\$+{'AVG'} is $+{'AVG'}.\n";
will print:
$+{'GFP'} is FILTERS GFP,GFP,100% ACTSHUT 17 AVG 4,1.000000 WRT.
$+{'AVG'} is .
Not surprisingly, $+{'AVG'} is uninitialised here.
It didn't quite make sense to me but I figured that the problem might be
that m// in list context returns a list of the capture variables created
in the match. So I tried:
$+{'GFP'} =~ scalar m/(?<AVG>AVG\s\d)/;
print "\$+{'GFP'} is $+{'GFP'}.\n";
print "\$+{'AVG'} is $+{'AVG'}.\n";
prints this:
$+{'GFP'} is FILTERS GFP,GFP,100% ACTSHUT 17 AVG 4,1.000000 WRT.
$+{'AVG'} is .
So now the match suddenly fails?!?
Hello Florian
First a couple of points
- Don't use named captures for simple regexes like this. They make the
code harder to understand, and are really only useful when using complex
patterns with multiple captures
- The built-in variables that relate to regular expressions are modified
by every successful pattern match. It is safer to save values that you
may want to use later in a sperate variable. In particular, your regex
m/(?<AVG>AVG\s\d)/ matches and, because there is no capture named `GFP`
it sets the corresponding element of %+ to undef. However $+{AVG} is now
set, as you did have a capture with that name.
The pattern match
$+{'GFP'} =~ m/(?<AVG>AVG\s\d);
is in void context (i.e. the result is being discarded. That is mostly
equivalent to scalar context as far as operator behaviour is concerned.
And you have made things worse by writing
$+{'GFP'} =~ scalar m/(?<AVG>AVG\s\d)/;
which is equivalent to
$+{'GFP'} =~ ($_ =~ /(?<AVG>AVG\s\d)/);
so it applies the pattern to the $_ variable, and uses the resut of that
match as another regex and applies that to $+{GFP}.
It would help to be able to see the format of your input data. If you
know that reading the entire file in is a bad idea then you shouldn't be
doing it.
This short piece of code does what you need, but I am sure there is a
better way.
$text =~ m/(FILTERS.*?WRT)/;
my $gfp = $1;
$gfp =~ m/(AVG\s+\d)/;
my $avg = $1;
HTH,
Rob
--
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]
http://learn.perl.org/