On 16 September 2012 08:20, Santosh Kumar <sntshkm...@gmail.com> wrote:
> I want to extract (no I don't want to download) all links that end in > a certain extension. > > Suppose there is a webpage, and in the head of that webpage there are > 4 different CSS files linked to external server. Let the head look > like this: > > <link rel="stylesheet" type="text/css" href="http://foo.bar/part1.css > "> > <link rel="stylesheet" type="text/css" href="http://foo.bar/part2.css > "> > <link rel="stylesheet" type="text/css" href="http://foo.bar/part3.css > "> > <link rel="stylesheet" type="text/css" href="http://foo.bar/part4.css > "> > > Please note that I don't want to download those CSS, instead I want > something like this (to stdout): > > http://foo.bar/part1.css > http://foo.bar/part1.css > http://foo.bar/part1.css > http://foo.bar/part1.css > > Also I don't want to use external libraries. I am asking for: which > libraries and functions should I use? > If you don't want to use any third-party libraries then the standard library has a module urllib2 for downloading a html file and htmlparser for parsing it: http://docs.python.org/library/urllib2.html#examples http://docs.python.org/library/htmlparser.html#example-html-parser-application Oscar
_______________________________________________ Tutor maillist - Tutor@python.org To unsubscribe or change subscription options: http://mail.python.org/mailman/listinfo/tutor