Hi Aravind,

On Sat, 15 Dec 2012 17:54:23 +0100
venkates <venka...@nt.ntnu.no> wrote:

> Hi,
> 
> I am trying to write a minimal web crawler. The aim is to discover new 
> URLs from the seed and crawl these new URLs further. The code is as follows:
> 
> use strict;
> use warnings;
> use Carp;
> 
> use Data::Dumper;
> use WWW::Mechanize;
> 
> my $url = "http://foobar.com";; # example
> my %links;
> 
> my $mech = WWW::Mechanize->new(autocheck => 1);
> 
> $mech->get($url);
> 
> my @cr_fronteir = $mech->find_all_links();
> 
> foreach my $links (@cr_fronteir) {
>       if ( $links->[0] =~ m/^http/xms ) {
>               $links{$links->[0]} = $links->[1]; #1;
>       }
> }
> 
> How should I proceed further to crawl the links in %links and how do I 
> add depth to prevent overflow if done recursively. Some suggestion is 
> much appreciated.
> 

You should use a queue or a stack. See my code at:

https://metacpan.org/source/SHLOMIF/WWW-LinkChecker-Internal-v0.0.3/scripts/link-checker

Please let me know if you have problems understanding it.

Regards,

        Shlomi Fish

-- 
-----------------------------------------------------------------
Shlomi Fish       http://www.shlomifish.org/
Best Introductory Programming Language - http://shlom.in/intro-lang

I invented the term Object‐Oriented, and I can tell you I did not have C++ in
mind.                  — Alan Kay (Attributed)

Please reply to list if it's a mailing list post - http://shlom.in/reply .

--
To unsubscribe, e-mail: beginners-unsubscr...@perl.org
For additional commands, e-mail: beginners-h...@perl.org
http://learn.perl.org/


Reply via email to