On Jun 7, Adrian Pang said:

>I'm trying to write a regex expression so it will extract the attribute
>names from a tag.  For example,
>
><P attr1="hello world" attr2 attr3="hi" attr4>
>
>The regex should return attr1, attr2, attr3 and attr4
>Is there anyway to write these into one regex expression?

You should really be using a real HTML parser, but for this type of thing
you can use the following code:

  # match an HTML attribute
  $attr = qr{
    \G
    \s*
    (\w+)
    (?:
      \s* = \s*
      (?:
        " [^"]* " |
        ' [^']* ' |
          [^\s>]+
      )
    )?
  }x;

  $TAG = q{<img border=0 ismap src='/foo.gif' alt="FOO!">};

  $TAG =~ /<\w+/g;  # position the \G anchor after the "<img"

  @attrs = $TAG =~ /$attr/g;

-- 
Jeff "japhy" Pinyan      [EMAIL PROTECTED]      http://www.pobox.com/~japhy/
I am Marillion, the wielder of Ringril, known as Hesinaur, the Winter-Sun.
Are you a Monk?  http://www.perlmonks.com/     http://forums.perlguru.com/
Perl Programmer at RiskMetrics Group, Inc.     http://www.riskmetrics.com/
Acacia Fraternity, Rensselaer Chapter.         Brother #734
**      Manning Publications, Co, is publishing my Perl Regex book      **

Reply via email to