web crawler

venkates Sat, 15 Dec 2012 08:55:26 -0800

Hi,

I am trying to write a minimal web crawler. The aim is to discover newURLs from the seed and crawl these new URLs further. The code is as follows:


use strict;
use warnings;
use Carp;

use Data::Dumper;
use WWW::Mechanize;

my $url = "http://foobar.com";; # example
my %links;

my $mech = WWW::Mechanize->new(autocheck => 1);

$mech->get($url);

my @cr_fronteir = $mech->find_all_links();

foreach my $links (@cr_fronteir) {
     if ( $links->[0] =~ m/^http/xms ) {
             $links{$links->[0]} = $links->[1]; #1;
     }
}

How should I proceed further to crawl the links in %links and how do Iadd depth to prevent overflow if done recursively. Some suggestion ismuch appreciated.


Thanks,

Aravind

web crawler

Reply via email to