Re: Parse html files

Tom Phoenix Tue, 13 Nov 2007 08:11:21 -0800

On 11/13/07, I BioKid <[EMAIL PROTECTED]> wrote:

> I have around 1000 html files, I got it using different web crawling programs.
> I need to save this and use it as a part of a database.
> But all the files have links to cgi programs. All these CGI links are
> mentioned as /cgi-bin/foo/foo.pl as path.
> I dont have local copy of these programs at the remote servers.
> Is there any way to parse html files and add the proper url before
> /cgi-bin/foo/foo.pl


Yes and no. Have you looked on CPAN? There are several modules
available for parsing HTML and managing URLs. But, in general, there's
no way to identify from the URL whether the server will or will not
call a CGI program. Of course, in many cases, the presence of
"cgi-bin" indicates a program is there; but that rule yields many
false positives and many false negatives. Still, if you can identify
such URLs sufficiently well for your own needs, the modules from CPAN
should take care of most of the task.

    http://search.cpan.org/

Hope this helps!

--Tom Phoenix
Stonehenge Perl Training

-- 
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]
http://learn.perl.org/

Re: Parse html files

Reply via email to