Hi, On Thu, Jan 31, 2013 at 8:49 PM, Jeswin <phillyj...@gmail.com> wrote:
> Hi again, > I tried to use the treebuilder modules to get emails from a webpage > html but I don't know enough. It just gave me more headaches. > Have you checked Rob's comments on your last post and how he use the module you asked about? If not Please check your last post. > My current method get the emails is to go to the site, put the source > code in MS Word, and run a regex to get all the emails in that html > page. > > I think I can get the list of sites in a file and probably download > the html source codes and parse offline. Can't I just use regex to > parse the emails? What can go wrong? > Why do a double job? Why not allow perl make your job easier, by getting each of the html document for you ( using modules like LWP::Simple, LWP::UserAgent etc ) and parse that using any of the HTML parser like HTML::TreeBuilder, HTML::TokeParser or any other ones you can use from CPAN. It is not advisable to parse HTML with regex, though it is possible. > > I'm a noob at perl and not a programmer. > > Thanks for the input > > -- > To unsubscribe, e-mail: beginners-unsubscr...@perl.org > For additional commands, e-mail: beginners-h...@perl.org > http://learn.perl.org/ > > > -- Tim