On 11/13/07, I BioKid <[EMAIL PROTECTED]> wrote: > I have around 1000 html files, I got it using different web crawling programs. > I need to save this and use it as a part of a database. > But all the files have links to cgi programs. All these CGI links are > mentioned as /cgi-bin/foo/foo.pl as path. > I dont have local copy of these programs at the remote servers. > Is there any way to parse html files and add the proper url before > /cgi-bin/foo/foo.pl
Yes and no. Have you looked on CPAN? There are several modules available for parsing HTML and managing URLs. But, in general, there's no way to identify from the URL whether the server will or will not call a CGI program. Of course, in many cases, the presence of "cgi-bin" indicates a program is there; but that rule yields many false positives and many false negatives. Still, if you can identify such URLs sufficiently well for your own needs, the modules from CPAN should take care of most of the task. http://search.cpan.org/ Hope this helps! --Tom Phoenix Stonehenge Perl Training -- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] http://learn.perl.org/