On Thu, Jan 31, 2013 at 1:07 PM, Octavian Rasnita <orasn...@gmail.com> wrote: > From: "Jeswin" <phillyj...@gmail.com> > > >> ... > > > > > It depends what kind of emails you want to get from those web pages. > > If you want to get only and only the emails that appear in links, the most > easier way is to use something like: > > #It will get the emails from links like: <a > href="mailto:n...@host.com">E-mail</a> > > use strict; > use warnings; > use LWP::Simple; > use HTML::TreeBuilder; > > my $content = get( 'http://www.site.org/' ); > my $tree = HTML::TreeBuilder->new_from_content( $content ); > > my @links = $tree->look_down( _tag => 'a', sub { > $_[0]->attr('href') && $_[0]->attr('href') =~ /^\s*mailto:/; > } ); > > my @emails; > > for my $link ( @links ) { > my $url = $link->attr('href'); > $url =~ s/^\s*mailto:\s*//; > push( @emails, $link ); > } > > (Untested) You should have the emails in @emails.
Variant of above: use HTML::TreeBuilder 5.03; use URI; my $t = HTML::TreeBuilder->new_from_url( $some_url ); my @emails; foreach my $link ( $t->look_down(_tag=>'a', href=>qr/^\s*mailto/) ) { my $url = URI->new( $link->attr('href') ); push( @emails, $url->path ); } > > > But if you want to get any e-mail from a page, no matter if it appears in a > link or not, probably the easiest way would be to use regular expressions. > Or search on CPAN if there is a module that does that easier. -- Charles DeRykus -- To unsubscribe, e-mail: beginners-unsubscr...@perl.org For additional commands, e-mail: beginners-h...@perl.org http://learn.perl.org/