Owen am Montag, 12. Dezember 2005 22.10:
> Xavier Noria wrote:
> > On Dec 12, 2005, at 11:10, Alexandre Checinski wrote:
> >> I have a string that looks like this :
> >> <counter id="183268" since="SDOPERFV16" aggr="Sum"
> >> name="pcmTcuFaultOutOfService"/>
> >
> > m// in list contex may help:
> >
> > my ($id, $name) = $xml =~ m{id="([^"]*)".*name="([^"]*)"/>};
>
> I despair of ever understanding REs
You will, just play around with it:
for example, make a "quick and dirty" small script along the lines
=start=
use strict;
use warnings;
my $teststring='something to test';
my $ok=$teststring=~m~sts~; # <<< play around here
print $ok ? 'yes, I got it!!!' : 'I despair of ever understanding REs';
=end=
If you think a regex does something, test the something with above script.
keep open some manuals:
perldoc perlre
perldoc perlretut
perldoc perlrequick
>
> How does the above work
>
> m Match
> { inside these braces (as the delimiter?)
No; the {} are in place of the usual //. That's why the 'm' after '=~' is
mandatory. Same holds for substitution. Sometimes the regexes are more
readable if somethings else than '//' is used, for example when matching
(unix) paths.
> id=" the characters id="
> ( Start the capture for $id
> [^"] The list of characters beginning with "
Not exactly; The list of chars *not* matching '"', thus the caret just after
'['.
> But wasn't that done on line 3 where we
> looked for a "
> * any number of characters
(including none)
> ) end of capture for $id
> " the end " for the data element captured
> .* anything until
more precicely: nothing or anything in "greedy"-mode until
> name=" etc do it all again till
>
> } ending delimiter
>
> So I have trouble with [^"]*
This means: none or more characters not being a '"'.
>
> What words describe that expression please
my ($id, $name) = $xml =~ m{id="([^"]*)".*name="([^"]*)"/>};
Extract two values $id and $name from the string $xml.
Do that by searching the literal string 'id="';
then look for someting between two doublequotes, whereby the thing between
must not contain a doublequote, and catch it into $1;
then skip everything until the literal string 'name=';
then look for someting between two doublequotes, whereby the thing between
must not contain a doublequote, and catch it into $2;
then match a directly following literal string '/>'.
Finally, assign ($1, $2) to the list ($id, $name).
The regex could be improved a bit, I think:
1. it would be less restrictive to allow spaces around '=' and before '/>'
2. there is a problem with the '.*' in the middle: if there are several tags
containing a name attribute, it will match the 'name=' of the last tag
containing a name attribute. This is because '.*' is greedy.
3. I'm not sure, but I think there must be a space between an attribute value
and the next attribute name
This leads to
m{id\s*=\s*"([^"]*)".+?name\s*=\s*"([^"]*)"\s*/>};
But even this version could be improved
(f.e. it can't handle escaped doublequotes (\") within the
attribute values. I'm not sure, but I think this is not allowed, but could be
used to trick the regex doing the wrong thing)
Somebody please correct me if I'm wrong, thanks, I'm overworked (beside not
being a guru)
hth, joe
--
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]
<http://learn.perl.org/> <http://learn.perl.org/first-response>