Re: other ways to parse emails from html?

timothy adigun Thu, 31 Jan 2013 12:14:08 -0800

Hi,

On Thu, Jan 31, 2013 at 8:49 PM, Jeswin <phillyj...@gmail.com> wrote:


> Hi again,
> I tried to use the treebuilder modules to get emails from a webpage
> html but I don't know enough. It just gave me more headaches.
>

   Have you checked Rob's comments on your last post and how he use the
module you      asked about? If not Please check your last post.


> My current method get the emails is to go to the site, put the source
> code in MS Word, and run a regex to get all the emails in that html
> page.
>
> I think I can get the list of sites in a file and probably download
> the html source codes and parse offline. Can't I just use regex to
> parse the emails? What can go wrong?
>

   Why do a double job? Why not allow perl make your job easier, by getting
each of the html document for you ( using modules like LWP::Simple,
LWP::UserAgent etc ) and parse that using any of the HTML parser like
HTML::TreeBuilder, HTML::TokeParser or any other ones you can use from CPAN.

It is not advisable to parse HTML with regex, though it is possible.

>
> I'm a noob at perl and not a programmer.
>
> Thanks for the input
>
> --
> To unsubscribe, e-mail: beginners-unsubscr...@perl.org
> For additional commands, e-mail: beginners-h...@perl.org
> http://learn.perl.org/
>
>
>


-- 
Tim

Re: other ways to parse emails from html?

Reply via email to