I want to get a list of the distinct domains (like perl.org) in all href attribute values of anchor tags in all files under a given directory that end with the extensions .htm and .html. I don't need to know which files contain the links, I just want to know what domains are referenced. I don't care about JavaScript links, and the Perl code would not have to crawl over http, just scan a filesystem. It seems really easy - File::Find and HTML::TokeParser, parse each file matching the criteria and populate an associative array. If someone has already written and tested something that functions like this and would be willing to share code, I would have to make slight modifications which would save me some time. Otherwise I would be interested in suggestions for how to write one.
-- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] <http://learn.perl.org/> <http://learn.perl.org/first-response>