Hi Aravind, On Sat, 15 Dec 2012 17:54:23 +0100 venkates <venka...@nt.ntnu.no> wrote:
> Hi, > > I am trying to write a minimal web crawler. The aim is to discover new > URLs from the seed and crawl these new URLs further. The code is as follows: > > use strict; > use warnings; > use Carp; > > use Data::Dumper; > use WWW::Mechanize; > > my $url = "http://foobar.com"; # example > my %links; > > my $mech = WWW::Mechanize->new(autocheck => 1); > > $mech->get($url); > > my @cr_fronteir = $mech->find_all_links(); > > foreach my $links (@cr_fronteir) { > if ( $links->[0] =~ m/^http/xms ) { > $links{$links->[0]} = $links->[1]; #1; > } > } > > How should I proceed further to crawl the links in %links and how do I > add depth to prevent overflow if done recursively. Some suggestion is > much appreciated. > You should use a queue or a stack. See my code at: https://metacpan.org/source/SHLOMIF/WWW-LinkChecker-Internal-v0.0.3/scripts/link-checker Please let me know if you have problems understanding it. Regards, Shlomi Fish -- ----------------------------------------------------------------- Shlomi Fish http://www.shlomifish.org/ Best Introductory Programming Language - http://shlom.in/intro-lang I invented the term Object‐Oriented, and I can tell you I did not have C++ in mind. — Alan Kay (Attributed) Please reply to list if it's a mailing list post - http://shlom.in/reply . -- To unsubscribe, e-mail: beginners-unsubscr...@perl.org For additional commands, e-mail: beginners-h...@perl.org http://learn.perl.org/