On 05/04/2013 04:37 PM, Rob Dixon wrote:
On 04/05/2013 14:26, Florian Huber wrote:
I'm parsing a logfile and don't quite understand the behaviour of m//.
From a previous regex match I have already captured $+{'GFP'}:
use strict;
use warnings;
(...)
$text =~ m/ (?<GFP>FILTERS .*? WRT)/x; # I simply have my whole
logfile in $text - I know there are better solutions.
print $+{'GFP'}, "\n";
prints this:
FILTERS GFP,GFP,100% ACTSHUT 17 AVG 4,1.000000 WRT
Now I want to go on parsing $+{'GFP'}. To be more precise, I want to
capture "AVG 4":
If I do:
$+{'GFP'} =~ m/(?<AVG>AVG\s\d);
print "\$+{'GFP'} is $+{'GFP'}.\n";
I get a warning that I'm using an uninitialised value and:
$+{'GFP'} is .
$+{'AVG'} is AVG 4.
So first question: Apparently some return value of $+{'GFP'} =~
m/PATTERN/; messes with this hash value. So from what I can remember,
the return value will indicate if the substitution was successful or
not. Then why don't I get some value like 0 or 1 in $+{'GFP'} but just
an uninitialised value?
If the match is not successful, $+{'GFP'} will stay untouched:
$+{'GFP'} =~ m/(?<AVG>AVG\s\d nothere);
print "\$+{'GFP'} is $+{'GFP'}.\n";
print "\$+{'AVG'} is $+{'AVG'}.\n";
will print:
$+{'GFP'} is FILTERS GFP,GFP,100% ACTSHUT 17 AVG 4,1.000000 WRT.
$+{'AVG'} is .
Not surprisingly, $+{'AVG'} is uninitialised here.
It didn't quite make sense to me but I figured that the problem might be
that m// in list context returns a list of the capture variables created
in the match. So I tried:
$+{'GFP'} =~ scalar m/(?<AVG>AVG\s\d)/;
print "\$+{'GFP'} is $+{'GFP'}.\n";
print "\$+{'AVG'} is $+{'AVG'}.\n";
prints this:
$+{'GFP'} is FILTERS GFP,GFP,100% ACTSHUT 17 AVG 4,1.000000 WRT.
$+{'AVG'} is .
So now the match suddenly fails?!?
Hello Florian
First a couple of points
- Don't use named captures for simple regexes like this. They make the
code harder to understand, and are really only useful when using complex
patterns with multiple captures
Well, to be honest, it's a bit more complicated than this - but I didn't
want to write too long an email. Anyway, it boils down to the following
critical piece in the logfile:
# Move Polychroic Turret In To Place
SET_POLYCHROIC GFP/YFP
# Open image file
OPF
FILTERS YFP,YFP,100%
ACTSHUT 17
CCD 2.000000
WRT
FILTERS GFP,GFP,100%
ACTSHUT 17
AVG 4,1.000000
WRT
FILTERS POL,POL,100%
ACTSHUT 18
CCD 0.050000
WRT
FILTERS DAPI,GFP,100%
ACTSHUT 17
CCD 1.000000
WRT
# Close image file(s)
CLF
BEEP
So I actually need to capture the parameters from all the four filter
sets (this is a microscope logfile) - hence the named captures. Plus,
the parameters within one category may vary, so sometimes there will be
an "AVG", sometimes there won't - if I simply run a long regex over
everything, I'm afraid that one slight deviation will mess everything
up. For example, I try to capture (AVG\s\d) but then there is none - so
if I do something like
my ($one, $AVG, $three) =~ m/(PATTERN1)(AVG\s\d)?(PATTERN3)/;
and one of the patterns is not there, still everything will be shifted
to the right, won't it? I figured that by naming the captures, I will
always know if there was a match at the very position intended.
- The built-in variables that relate to regular expressions are modified
by every successful pattern match. It is safer to save values that you
may want to use later in a sperate variable. In particular, your regex
m/(?<AVG>AVG\s\d)/ matches and, because there is no capture named `GFP`
it sets the corresponding element of %+ to undef. However $+{AVG} is now
set, as you did have a capture with that name.
That explains a lot, thank you.
The pattern match
$+{'GFP'} =~ m/(?<AVG>AVG\s\d);
is in void context (i.e. the result is being discarded. That is mostly
equivalent to scalar context as far as operator behaviour is concerned.
And you have made things worse by writing
$+{'GFP'} =~ scalar m/(?<AVG>AVG\s\d)/;
which is equivalent to
$+{'GFP'} =~ ($_ =~ /(?<AVG>AVG\s\d)/);
so it applies the pattern to the $_ variable, and uses the resut of that
match as another regex and applies that to $+{GFP}.
So what is the return value of the pattern match operator then? 0 if
successful, 1 if not? The number of matches?
It would help to be able to see the format of your input data. If you
know that reading the entire file in is a bad idea then you shouldn't be
doing it.
Well, the thing is, the logfiles are a few kb long - and reading
everything into one string does make some regexes easier, in my opinion.
I tried first with while loops but then I found it very difficult to
discriminate between recurring patterns and storing whatever is found in
between. Maybe it's just me because I'm using Perl only every now and
then and there are cleverer ways of doing that. In my case, I figured
that the bit of time/RAM I lose is made up by having easier regexes.
This short piece of code does what you need, but I am sure there is a
better way.
$text =~ m/(FILTERS.*?WRT)/;
my $gfp = $1;
$gfp =~ m/(AVG\s+\d)/;
my $avg = $1;
HTH,
Rob
Thanks again,
Florian