Re: Pattern match operator

Florian Huber Sat, 04 May 2013 11:54:26 -0700

On 05/04/2013 04:37 PM, Rob Dixon wrote:

On 04/05/2013 14:26, Florian Huber wrote:


I'm parsing a logfile and don't quite understand the behaviour of m//.

 From a previous regex match I have already captured $+{'GFP'}:

use strict;
use warnings;

(...)

$text =~ m/ (?<GFP>FILTERS .*? WRT)/x;    # I simply have my whole
logfile in $text - I know there are better solutions.
print $+{'GFP'}, "\n";

prints this:
FILTERS GFP,GFP,100% ACTSHUT 17 AVG 4,1.000000 WRT

Now I want to go on parsing $+{'GFP'}. To be more precise, I want to
capture "AVG 4":

If I do:

$+{'GFP'} =~ m/(?<AVG>AVG\s\d);
print "\$+{'GFP'} is $+{'GFP'}.\n";

I get a warning that I'm using an uninitialised value and:

$+{'GFP'} is .
$+{'AVG'} is AVG 4.

So first question: Apparently some return value of $+{'GFP'} =~
m/PATTERN/; messes with this hash value. So from what I can remember,
the return value will indicate if the substitution was successful or
not. Then why don't I get some value like 0 or 1 in $+{'GFP'} but just
an uninitialised value?

If the match is not successful, $+{'GFP'} will stay untouched:

  $+{'GFP'} =~ m/(?<AVG>AVG\s\d nothere);
print "\$+{'GFP'} is $+{'GFP'}.\n";
print "\$+{'AVG'} is $+{'AVG'}.\n";

will print:
$+{'GFP'} is FILTERS GFP,GFP,100% ACTSHUT 17 AVG 4,1.000000 WRT.
$+{'AVG'} is .

Not surprisingly, $+{'AVG'} is uninitialised here.

It didn't quite make sense to me but I figured that the problem might be
that m// in list context returns a list of the capture variables created
in the match. So I tried:

$+{'GFP'} =~ scalar m/(?<AVG>AVG\s\d)/;
print "\$+{'GFP'} is $+{'GFP'}.\n";
print "\$+{'AVG'} is $+{'AVG'}.\n";

prints this:
$+{'GFP'} is FILTERS GFP,GFP,100% ACTSHUT 17 AVG 4,1.000000 WRT.
$+{'AVG'} is .

So now the match suddenly fails?!?


Hello Florian

First a couple of points

- Don't use named captures for simple regexes like this. They make the
code harder to understand, and are really only useful when using complex
patterns with multiple captures

Well, to be honest, it's a bit more complicated than this - but I didn'twant to write too long an email. Anyway, it boils down to the followingcritical piece in the logfile:



# Move Polychroic Turret In To Place
SET_POLYCHROIC GFP/YFP

# Open image file
OPF
FILTERS YFP,YFP,100%
ACTSHUT 17
CCD 2.000000
WRT
FILTERS GFP,GFP,100%
ACTSHUT 17
AVG 4,1.000000
WRT
FILTERS POL,POL,100%
ACTSHUT 18
CCD 0.050000
WRT
FILTERS DAPI,GFP,100%
ACTSHUT 17
CCD 1.000000
WRT

# Close image file(s)
CLF
BEEP

So I actually need to capture the parameters from all the four filtersets (this is a microscope logfile) - hence the named captures. Plus,the parameters within one category may vary, so sometimes there will bean "AVG", sometimes there won't - if I simply run a long regex overeverything, I'm afraid that one slight deviation will mess everythingup. For example, I try to capture (AVG\s\d) but then there is none - soif I do something like


my ($one, $AVG, $three) =~ m/(PATTERN1)(AVG\s\d)?(PATTERN3)/;

and one of the patterns is not there, still everything will be shiftedto the right, won't it? I figured that by naming the captures, I willalways know if there was a match at the very position intended.

- The built-in variables that relate to regular expressions are modified
by every successful pattern match. It is safer to save values that you
may want to use later in a sperate variable. In particular, your regex
m/(?<AVG>AVG\s\d)/ matches and, because there is no capture named `GFP`
it sets the corresponding element of %+ to undef. However $+{AVG} is now
set, as you did have a capture with that name.


That explains a lot, thank you.


The pattern match

   $+{'GFP'} =~ m/(?<AVG>AVG\s\d);

is in void context (i.e. the result is being discarded. That is mostly
equivalent to scalar context as far as operator behaviour is concerned.
And you have made things worse by writing

    $+{'GFP'} =~ scalar m/(?<AVG>AVG\s\d)/;

which is equivalent to

    $+{'GFP'} =~ ($_ =~ /(?<AVG>AVG\s\d)/);

so it applies the pattern to the $_ variable, and uses the resut of that
match as another regex and applies that to $+{GFP}.

So what is the return value of the pattern match operator then? 0 ifsuccessful, 1 if not? The number of matches?

It would help to be able to see the format of your input data. If you
know that reading the entire file in is a bad idea then you shouldn't be
doing it.

Well, the thing is, the logfiles are a few kb long - and readingeverything into one string does make some regexes easier, in my opinion.I tried first with while loops but then I found it very difficult todiscriminate between recurring patterns and storing whatever is found inbetween. Maybe it's just me because I'm using Perl only every now andthen and there are cleverer ways of doing that. In my case, I figuredthat the bit of time/RAM I lose is made up by having easier regexes.

This short piece of code does what you need, but I am sure there is a
better way.

    $text =~ m/(FILTERS.*?WRT)/;
    my $gfp = $1;

    $gfp =~ m/(AVG\s+\d)/;
    my $avg = $1;

HTH,

Rob


Thanks again,

Florian

Re: Pattern match operator

Reply via email to