On Sep 25, 4:33 pm, [EMAIL PROTECTED] (Rob Dixon) wrote:
> Jonathan Lang wrote:
> > Rob Dixon wrote:
> >> Jonathan Lang wrote:
> >>> I'm trying to devise a regex that matches from the first double-quote
> >>> character found to the next double-quote character that isn't part of
> >>> a pair; but for some reason, I'm having no luck.  Here's what I tried:
>
> >>>   /"(.*?)"(?!")/
>
> >>> Sample text:
>
> >>>   author: "Jonathan ""Dataweaver"" Lang" key=val
>
> >>> What I'm getting for $1 in the first match:
>
> >>>   Jonathan "
>
> >>> What I'm looking for:
>
> >>>   Jonathan ""Dataweaver"" Lang
>
> >>> What did I miss, and how can I most efficiently perform the desired match?
> >> Your regex looks for the first double-quote and then captures everything 
> >> after
> >> that up to the first subsequent double-quote that isn't followed 
> >> immediately by
> >> another one. The second quote of the pair before 'Dataweaver' matches this
> >> criterion so your regex captures up to the character before it.
>
> >> This:
>
> >>   $str =~ /"((?:.*?"")*.*?)"/;
>
> >> should do what you want. After finding the first double-quote it captures 
> >> all
> >> following sequences ending in a pair of double quotes, plus anything after
> >> those up to the closing quote.
>
> > Ah.  I had tried /"((.*?"")*.*?)"/ and hadn't gotten it to work; it
> > never occurred to me to try the non-capturing group instead.
>
> That also works! (But is performing unnecessary and wasteful captures.)
>
> Rob
>
> use strict;
> use warnings;
>
> my $str = q(author: "Jonathan ""Dataweaver"" Lang" key=val);
>
> $str =~ /"((.*?"")*.*?)"/;
> print $1, "\n";
>
> **OUTPUT**
>
> Jonathan ""Dataweaver"" Lang

use strict;
use warnings;

my $str = q(author: "Jonathan ""Dataweaver"" Lang" key=val fly-in-
ointment: "Brian ""Nobull"" McCauley");

$str =~ /"((.*?"")*.*?)"/;
print $1, "\n";

__END__

**OUTPUT**

Jonathan ""Dataweaver"" Lang" key=val fly-in-ointment: "Brian
""Nobull"" McCaule
y

An alternative pattern would be /"((?:[^"]*"")*.*?)"/ although the
behaviour or that may be counter-intuative if presented with bad input
in which there's no closing quote.


My perferred pattern would be much closer to Jonathan's original:

/"((?:[^"]|"")*)"(?!")/

This has the advantage of failing to match if presented with input
that lacks a closing quote.


-- 
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]
http://learn.perl.org/


Reply via email to