Re: [Tutor] list all links with certain extension in an html file python

Oscar Benjamin Fri, 28 Sep 2012 05:11:31 -0700

On 16 September 2012 08:20, Santosh Kumar <sntshkm...@gmail.com> wrote:


> I want to extract (no I don't want to download) all links that end in
> a certain extension.
>
> Suppose there is a webpage, and in the head of that webpage there are
> 4 different CSS files linked to external server. Let the head look
> like this:
>
>     <link rel="stylesheet" type="text/css" href="http://foo.bar/part1.css
> ">
>     <link rel="stylesheet" type="text/css" href="http://foo.bar/part2.css
> ">
>     <link rel="stylesheet" type="text/css" href="http://foo.bar/part3.css
> ">
>     <link rel="stylesheet" type="text/css" href="http://foo.bar/part4.css
> ">
>
> Please note that I don't want to download those CSS, instead I want
> something like this (to stdout):
>
>     http://foo.bar/part1.css
>     http://foo.bar/part1.css
>     http://foo.bar/part1.css
>     http://foo.bar/part1.css
>
> Also I don't want to use external libraries. I am asking for: which
> libraries and functions should I use?
>

If you don't want to use any third-party libraries then the standard
library has a module urllib2 for downloading a html file and htmlparser for
parsing it:
http://docs.python.org/library/urllib2.html#examples
http://docs.python.org/library/htmlparser.html#example-html-parser-application

Oscar

_______________________________________________
Tutor maillist  -  Tutor@python.org
To unsubscribe or change subscription options:
http://mail.python.org/mailman/listinfo/tutor

Re: [Tutor] list all links with certain extension in an html file python

Reply via email to