I want to get a list of the distinct domains (like perl.org) in all href
attribute values of anchor tags in all files under a given directory that end
with the extensions .htm and .html.  I don't need to know which files contain
the links, I just want to know what domains are referenced.  I don't care
about JavaScript links, and the Perl code would not have to crawl over http,
just scan a filesystem.  It seems really easy - File::Find and
HTML::TokeParser, parse each file matching the criteria and populate an
associative array.  If someone has already written and tested something that
functions like this and would be willing to share code, I would have to make
slight modifications which would save me some time.  Otherwise I would be
interested in suggestions for how to write one.  

-- 
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]
<http://learn.perl.org/> <http://learn.perl.org/first-response>


Reply via email to