Magnus Bodin <[EMAIL PROTECTED]> said something to this effect on 07/03/2001:
> The extracting regular expressions though, has a lot left as an
> exercise to the reader.
>
> I enclose here two functions that I use myself to extract URL:s
> from a message. This is not perfect and improvement suggestions
> are welcome. Especially broken URL:s (broken on two lines)
> should probably intelligently be pasted together somehow.
As long as we are using Perl, use the URI::Find module. It takes
the text to be searched and callback to execute with each match.
use URI::Find;
my $msg = do { local $/; <>; };
find_uris($msg, sub { print "$_[1]\n" });
This will produce, when run against a random message in my
mailspool (readmsg $MAIL 3 | perl -e '...'):
http://advogato.org/
http://slashdot.org/
http://digitalmass.boston.com/
www.google.com
http://www.boston.com
URI::Find requires the URI family of Perl modules; but it is very
complete and very reliable. It handles multiline URIs, and
(un|improperly)-quoted URIs as well.
To make this sweeter, the callback to find_uri's can do anything
you want, like reformat as HTML. The subroutine gets passed a
URI::URL object, which has tons of methods. If I change the
find_uris call above to read:
find_uris($msg, sub { print $_[0]->abs });
I get these results:
http://advogato.org/
http://slashdot.org/
http://digitalmass.boston.com/
http://www.google.com/
http://www.boston.com/
(Notice how the last two URIs are well formed, rather than being
in the malformed state the sender of the message wrote them in;
see above.)
Formatting these URLs as HTML is trivial:
find_uris($msg, sub { printf qq(<a href="%s">%s</a>\n), $_[0]->abs, $_[1] });
This produces:
<a href="http://www.advogato.org/">http://www.advogato.org/</a>
<a href="http://slashdot.org/">http://slashdot.org/</a>
<a href="http://digitalmass.boston.com/">http://digitalmass.boston.com/</a>
<a href="http://www.google.com/">www.google.com</a>
<a href="http://www.boston.com/">http://www.boston.com</a>
Finally, as long as we're talking about procmail, we can write
this as a procmail recipe:
:0 f
* ^[EMAIL PROTECTED]
| perl -MURI::Find -le '{local$/;$f=<>}find_uris($f,sub{print($_[1])})' >> ~/urls
(darren)
--
Remember, UNIX spelled backwards is XINU.