Omega -1911 am Freitag, 1. Dezember 2006 06:05:
> Hello all,
>
> I am trying to parse calendar events for a rss feed into variables. Can
> someone help with building the following regex or point me in the direction
> of some good examples? Thanks in advance.
>
> Here is what I have tried:  (I don't know much about complex regex's as you
> see)
> $mystring =~ /.+(<p><li><b>)(\w+) (<FONT COLOR=\"\#990000\">)(\w+)(\[Ref
> \#(\d+\])(.+)$/);
>
>
> Here is a sample string:
> <p><li><b> DATE <FONT COLOR="#990000">TITLE</FONT></b> EVENT <a href="
> http://www.mysite.com"target="_new";>www.mysite.com</a> [Ref #67579]</li>
>
> What I would like to pull out is the TITLE && EVENT information. The sample
> string is the format for each event. Any takers on this? Again, thanks for
> any help.

If you *really* want do it with a regex, and not a parser (XML::LibXML, 
XML::Simple, etc.), here is one possibility.

However, note that a regex is very fragile if it comes to format changes, or 
the input has unexpected chars in it. In the regex below, I try to be 
flexible concerning white space in the input; one could also be more specific 
in the part following the info to extract. 

There are generally two somehow contradicting aims:
- be most specific to not match unwanted content
- be liberal to handle format changes

How did you develop the regex? It seems not to match as you liked. One way is 
to build it step by step; starting with matching strings between <p></p>, 
ckecking, be more specific, checking etc.

Note that I escape the '#' in the regex because of the /x modifier that allows 
comments.

BEWARE: Id did not spend hours. It just extracts what you want from the $input 
present.

#!/usr/bin/perl
use strict; use warnings;

my $input='
<p><li><b> DATE <FONT COLOR="#990000">TITLE1</FONT></b> EVENT1
<a href="http://www.mysite.com"target="_new";>www.mysite.com</a>
[Ref #67579]</li></p>
<p><li><b> DATE <FONT COLOR="#990000">TITLE2</FONT></b> EVENT2
<a href="http://www.mysite.com"target="_new";>www.mysite.com</a>
[Ref #67579]</li></p>
';


my %info = $input =~ m;
  <p>\s*
    <li>\s*
      <b>.*?
        <font\s*color\s*=\s*"\#990000"[^>]*?>\s*(.*?)\s*</font>\s*
      </b>\s*(.*?)\s*<a.*?</a>\s*\[ref[^\]]+?\]\s*
    </li>\s*
  </p>
;mgxsi;

print map { "<$_> => <$info{$_}>\n" } sort keys %info;

__END__

Dani

-- 
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]
<http://learn.perl.org/> <http://learn.perl.org/first-response>


Reply via email to