dont worry it has been solved
--
https://mail.python.org/mailman/listinfo/python-list
[EMAIL PROTECTED] wrote:
Hi everyone
I am trying to build my own web crawler for an experiement and I don't
know how to access HTTP protocol with python.
Also, Are there any Opensource Parsing engine for HTML documents
available in Python too? That would be great.
Check on Mechanize. It wraps
Stefan Behnel <[EMAIL PROTECTED]>:
> [EMAIL PROTECTED] wrote:
>> I am trying to build my own web crawler for an experiement and I don't
>> know how to access HTTP protocol with python.
>>
>> Also, Are there any Opensource Parsing engine for HTML documents
>> available in Python too? That would be
[EMAIL PROTECTED] wrote:
> I am trying to build my own web crawler for an experiement and I don't
> know how to access HTTP protocol with python.
>
> Also, Are there any Opensource Parsing engine for HTML documents
> available in Python too? That would be great.
Try lxml.html. It parses broken HTM
> Hi everyone
Hello
> I am trying to build my own web crawler for an experiement and I don't
> know how to access HTTP protocol with python.
urllib2: http://docs.python.org/lib/module-urllib2.html
> Also, Are there any Opensource Parsing engine for HTML documents
> available in Python too? That w
On Jun 28, 9:03 pm, [EMAIL PROTECTED] wrote:
> Hi everyone
> I am trying to build my own web crawler for an experiement and I don't
> know how to access HTTP protocol with python.
Look at the httplib module.
>
> Also, Are there any Opensource Parsing engine for HTML documents
> available in Pytho
On Sat, 28 Jun 2008 19:03:39 -0700, disappearedng wrote:
> Hi everyone
> I am trying to build my own web crawler for an experiement and I don't
> know how to access HTTP protocol with python.
>
> Also, Are there any Opensource Parsing engine for HTML documents
> available in Python too? That woul
En Wed, 23 Jan 2008 10:40:14 -0200, Alnilam <[EMAIL PROTECTED]> escribió:
> Skipping past html validation, and html to xhtml 'cleaning', and
> instead starting with the assumption that I have files that are valid
> XHTML, can anyone give me a good example of how I would use _ htmllib,
> HTMLParser
On Jan 23, 2008 7:40 AM, Alnilam <[EMAIL PROTECTED]> wrote:
> Skipping past html validation, and html to xhtml 'cleaning', and
> instead starting with the assumption that I have files that are valid
> XHTML, can anyone give me a good example of how I would use _ htmllib,
> HTMLParser, or ElementTre
On Jan 23, 3:54 am, "M.-A. Lemburg" <[EMAIL PROTECTED]> wrote:
> >> I was asking this community if there was a simple way to use only the
> >> tools included with Python to parse a bit of html.
>
> There are lots of ways doing HTML parsing in Python. A common
> one is e.g. using mxTidy to convert
> The pages I'm trying to write this code to run against aren't in the
> wild, though. They are static html files on my company's lan, are very
> consistent in format, and are (I believe) valid html.
Obvious way to check this is to go to http://validator.w3.org/ and see
what it tells you about you
On 2008-01-23 01:29, Gabriel Genellina wrote:
> En Tue, 22 Jan 2008 19:20:32 -0200, Alnilam <[EMAIL PROTECTED]> escribió:
>
>> On Jan 22, 11:39 am, "Diez B. Roggisch" <[EMAIL PROTECTED]> wrote:
>>> Alnilam wrote:
On Jan 22, 8:44 am, Alnilam <[EMAIL PROTECTED]> wrote:
>> Pardon me, but the
On Jan 22, 7:29 pm, "Gabriel Genellina" <[EMAIL PROTECTED]>
wrote:
>
> > I was asking this community if there was a simple way to use only the
> > tools included with Python to parse a bit of html.
>
> If you *know* that your document is valid HTML, you can use the HTMLParser
> module in the stan
On Jan 22, 7:29 pm, "Gabriel Genellina" <[EMAIL PROTECTED]>
wrote:
>
> > I was asking this community if there was a simple way to use only the
> > tools included with Python to parse a bit of html.
>
> If you *know* that your document is valid HTML, you can use the HTMLParser
> module in the stand
En Tue, 22 Jan 2008 19:20:32 -0200, Alnilam <[EMAIL PROTECTED]> escribió:
> On Jan 22, 11:39 am, "Diez B. Roggisch" <[EMAIL PROTECTED]> wrote:
>> Alnilam wrote:
>> > On Jan 22, 8:44 am, Alnilam <[EMAIL PROTECTED]> wrote:
>> >> > Pardon me, but the standard issue Python 2.n (for n in range(5, 2,
>>
On Jan 22, 11:39 am, "Diez B. Roggisch" <[EMAIL PROTECTED]> wrote:
> Alnilam wrote:
> > On Jan 22, 8:44 am, Alnilam <[EMAIL PROTECTED]> wrote:
> >> > Pardon me, but the standard issue Python 2.n (for n in range(5, 2,
> >> > -1)) doesn't have an xml.dom.ext ... you must have the mega-monstrous
> >>
Alnilam wrote:
> On Jan 22, 8:44 am, Alnilam <[EMAIL PROTECTED]> wrote:
>> > Pardon me, but the standard issue Python 2.n (for n in range(5, 2,
>> > -1)) doesn't have an xml.dom.ext ... you must have the mega-monstrous
>> > 200-modules PyXML package installed. And you don't want the 75Kb
>> > Beau
On Jan 22, 8:44 am, Alnilam <[EMAIL PROTECTED]> wrote:
> > Pardon me, but the standard issue Python 2.n (for n in range(5, 2,
> > -1)) doesn't have an xml.dom.ext ... you must have the mega-monstrous
> > 200-modules PyXML package installed. And you don't want the 75Kb
> > BeautifulSoup?
>
> I wasn'
On Jan 22, 7:44 am, Alnilam <[EMAIL PROTECTED]> wrote:
> ...I move from computer to
> computer regularly, and while all have a recent copy of Python, each
> has different (or no) extra modules, and I don't always have the
> luxury of downloading extras. That being said, if there's a simple way
> of
> Pardon me, but the standard issue Python 2.n (for n in range(5, 2,
> -1)) doesn't have an xml.dom.ext ... you must have the mega-monstrous
> 200-modules PyXML package installed. And you don't want the 75Kb
> BeautifulSoup?
I wasn't aware that I had PyXML installed, and can't find a reference
to
On 22 Jan, 06:31, Alnilam <[EMAIL PROTECTED]> wrote:
> Sorry for the noob question, but I've gone through the documentation
> on python.org, tried some of the diveintopython and boddie's examples,
> and looked through some of the numerous posts in this group on the
> subject and I'm still rather co
On Jan 22, 4:31 pm, Alnilam <[EMAIL PROTECTED]> wrote:
> Sorry for the noob question, but I've gone through the documentation
> on python.org, tried some of the diveintopython and boddie's examples,
> and looked through some of the numerous posts in this group on the
> subject and I'm still rather
John Machin wrote:
> One can even use ElementTree, if the HTML is well-formed. See below.
> However if it is as ill-formed as the sample (4th "td" element not
> closed; I've omitted it below), then the OP would be better off
> sticking with Beautiful Soup :-)
Or (as we were talking about the best
John Machin wrote:
> One can even use ElementTree, if the HTML is well-formed. See below.
> However if it is as ill-formed as the sample (4th "td" element not
> closed; I've omitted it below), then the OP would be better off
> sticking with Beautiful Soup :-)
or get the best of both worlds:
On Feb 11, 6:05 pm, Ayaz Ahmed Khan <[EMAIL PROTECTED]> wrote:
> "mtuller" typed:
>
> > I have also tried Beautiful Soup, but had trouble understanding the
> > documentation
>
> As Gabriel has suggested, spend a little more time going through the
> documentation of BeautifulSoup. It is pretty easy
"mtuller" typed:
> I have also tried Beautiful Soup, but had trouble understanding the
> documentation
As Gabriel has suggested, spend a little more time going through the
documentation of BeautifulSoup. It is pretty easy to grasp.
I'll give you an example: I want to extract the text between the
En Sat, 10 Feb 2007 20:07:43 -0300, mtuller <[EMAIL PROTECTED]> escribió:
>
>
> LETTER
>
> 33,699
>
> 1.0
>
>
>
> I want to extract the 33,699 (which is dynamic) and set the value to a
> variable so that I can insert it into a database. I have tried parsing
> [...]
> I have also tried Beau
On Nov 13, 1:12 pm, [EMAIL PROTECTED] wrote:
>
> I need a help on HTML parser.
>
>
> I saw a couple of python parsers like pyparsing, yappy, yapps, etc but
> they havn't given any example for HTML parsing.
Geez, how hard did you look? pyparsing's wiki menu includes an
'Examples' link, which take
[EMAIL PROTECTED] wrote:
> I am involved in one project which tends to collect news
> information published on selected, known web sites inthe format of
> HTML, RSS, etc and sortlist them and create a bookmark on our website
> for the news content(we will use django for web development). Curren
[EMAIL PROTECTED] wrote:
> I am involved in one project which tends to collect news
> information published on selected, known web sites inthe format of
> HTML, RSS, etc
I just can't imagine why anyone would still want to do this.
With RSS, it's an easy (if not trivial) problem.
With HTML
a combination of urllib, urlib2 and BeautifulSoup should do it.
Read BeautifulSoup's documentation to know how to browse through the
DOM.
[EMAIL PROTECTED] a écrit :
> Hi All,
>
> I am involved in one project which tends to collect news
> information published on selected, known web sites int
[EMAIL PROTECTED] wrote:
> I need a help on HTML parser.
http://www.effbot.org/pyfaq/tutor-how-do-i-get-data-out-of-html.htm
--
http://mail.python.org/mailman/listinfo/python-list
>> this is a comment in JavaScript, which is itself inside an HTML comment
> Did you read the post?
misread it rather ...
--
http://mail.python.org/mailman/listinfo/python-list
[EMAIL PROTECTED] wrote:
> Python 2.3.5 seems to choke when trying to parse html files, because it
> doesn't realize that what's inside is a comment in HTML,
> even if this comment is inside , especially if it's a
> comment inside that script code too.
nope. what's inside is not a comment if
"Istvan Albert" <[EMAIL PROTECTED]> wrote:
>
>> this is a comment in JavaScript, which is itself inside an HTML comment
>
>Don't nest HTML comments. Occasionaly it may break the browsers as
>well.
Did you read the post? He didn't nest HTML comments. He put a Javascript
comment inside an HTML com
> this is a comment in JavaScript, which is itself inside an HTML comment
Don't nest HTML comments. Occasionaly it may break the browsers as
well.
(I remember this from one of the weirdest of bughunts : whenever the
number of characters between nested HTML comments was divisible by four
the page
<[EMAIL PROTECTED]> wrote in message
news:[EMAIL PROTECTED]
> Python 2.3.5 seems to choke when trying to parse html files, because it
> doesn't realize that what's inside is a comment in HTML,
> even if this comment is inside , especially if it's a
> comment inside that script code too.
Actua
> // - this is a comment in JavaScript, which is itself inside
> an HTML comment
This is supposed to be one line. Got wrapped during posting.
--
http://mail.python.org/mailman/listinfo/python-list
Take a look at SW Explorer Automation
(http://home.comcast.net/~furmana/SWIEAutomation.htm)(SWEA). SWEA
creates an object model (automation interface) for any Web application
running in Internet Explorer. It supports all IE functionality:frames,
java script, dialogs, downloads.
The runtime can a
John J. Lee wrote:
> Sanjay Arora <[EMAIL PROTECTED]> writes:
>
> > We are looking to select the language & toolset more suitable for a
> > project that requires getting data from several web-sites in real-
> > timehtml parsing/scraping. It would require full emulation of the
> > browser, incl
Sanjay Arora <[EMAIL PROTECTED]> writes:
> We are looking to select the language & toolset more suitable for a
> project that requires getting data from several web-sites in real-
> timehtml parsing/scraping. It would require full emulation of the
> browser, including handling cookies, automat
"Fuzzyman" <[EMAIL PROTECTED]> writes:
> The standard library module for fetching HTML is urllib2.
Does urllib2 replace everything in urllib? I thought there was some
urllib functionality that urllib2 didn't do.
> There is a project called mechanize, built by John Lee on top of
> urllib2 and othe
The standard library module for fetching HTML is urllib2.
The best module for scraping the HTML is BeautifulSoup.
There is a project called mechanize, built by John Lee on top of
urllib2 and other standard modules.
It will emulate a browsers behaviour - including history, cookies,
basic authenti
Sanjay Arora <[EMAIL PROTECTED]> writes:
> We are looking to select the language & toolset more suitable for a
> project that requires getting data from several web-sites in real-
> timehtml parsing/scraping. It would require full emulation of the
> browser, including handling cookies, automat
44 matches
Mail list logo