Re: Html Parsing stuff

2014-07-21 Thread Nicholas Cannon
dont worry it has been solved -- https://mail.python.org/mailman/listinfo/python-list

Html Parsing stuff

2014-07-21 Thread Nicholas Cannon
Ok i get the basics of this and i have been doing some successful parsings and using regular expressions to find html tags. I have tried to find an img tag and write that image to a file. I have had no success. It says it has successfully wrote the image to the file with a try... except statemen

Re: Beautifulsoup html parsing - nested tags

2011-01-05 Thread Selvam
On Wed, Jan 5, 2011 at 2:58 PM, Selvam wrote: > Hi all, > > I am trying to parse some html string with BeatifulSoup. > > The string is, > > > > > > > > Tax > > Base > >

Beautifulsoup html parsing - nested tags

2011-01-05 Thread Selvam
Hi all, I am trying to parse some html string with BeatifulSoup. The string is, Tax Base Amount rtables=soup.findAll(re.

Re: HTML Parsing

2008-06-30 Thread Larry Bates
[EMAIL PROTECTED] wrote: Hi everyone I am trying to build my own web crawler for an experiement and I don't know how to access HTTP protocol with python. Also, Are there any Opensource Parsing engine for HTML documents available in Python too? That would be great. Check on Mechanize. It wraps

Re: HTML Parsing

2008-06-29 Thread Sebastian "lunar" Wiesner
Stefan Behnel <[EMAIL PROTECTED]>: > [EMAIL PROTECTED] wrote: >> I am trying to build my own web crawler for an experiement and I don't >> know how to access HTTP protocol with python. >> >> Also, Are there any Opensource Parsing engine for HTML documents >> available in Python too? That would be

Re: HTML Parsing

2008-06-28 Thread Stefan Behnel
[EMAIL PROTECTED] wrote: > I am trying to build my own web crawler for an experiement and I don't > know how to access HTTP protocol with python. > > Also, Are there any Opensource Parsing engine for HTML documents > available in Python too? That would be great. Try lxml.html. It parses broken HTM

Re: HTML Parsing

2008-06-28 Thread Victor Noagbodji
> Hi everyone Hello > I am trying to build my own web crawler for an experiement and I don't > know how to access HTTP protocol with python. urllib2: http://docs.python.org/lib/module-urllib2.html > Also, Are there any Opensource Parsing engine for HTML documents > available in Python too? That w

Re: HTML Parsing

2008-06-28 Thread Benjamin
On Jun 28, 9:03 pm, [EMAIL PROTECTED] wrote: > Hi everyone > I am trying to build my own web crawler for an experiement and I don't > know how to access HTTP protocol with python. Look at the httplib module. > > Also, Are there any Opensource Parsing engine for HTML documents > available in Pytho

Re: HTML Parsing

2008-06-28 Thread Dan Stromberg
On Sat, 28 Jun 2008 19:03:39 -0700, disappearedng wrote: > Hi everyone > I am trying to build my own web crawler for an experiement and I don't > know how to access HTTP protocol with python. > > Also, Are there any Opensource Parsing engine for HTML documents > available in Python too? That woul

HTML Parsing

2008-06-28 Thread disappearedng
Hi everyone I am trying to build my own web crawler for an experiement and I don't know how to access HTTP protocol with python. Also, Are there any Opensource Parsing engine for HTML documents available in Python too? That would be great. -- http://mail.python.org/mailman/listinfo/python-list

Re: HTML parsing confusion

2008-01-23 Thread Gabriel Genellina
En Wed, 23 Jan 2008 10:40:14 -0200, Alnilam <[EMAIL PROTECTED]> escribió: > Skipping past html validation, and html to xhtml 'cleaning', and > instead starting with the assumption that I have files that are valid > XHTML, can anyone give me a good example of how I would use _ htmllib, > HTMLParser

Re: HTML parsing confusion

2008-01-23 Thread Jerry Hill
On Jan 23, 2008 7:40 AM, Alnilam <[EMAIL PROTECTED]> wrote: > Skipping past html validation, and html to xhtml 'cleaning', and > instead starting with the assumption that I have files that are valid > XHTML, can anyone give me a good example of how I would use _ htmllib, > HTMLParser, or ElementTre

Re: HTML parsing confusion

2008-01-23 Thread Alnilam
On Jan 23, 3:54 am, "M.-A. Lemburg" <[EMAIL PROTECTED]> wrote: > >> I was asking this community if there was a simple way to use only the > >> tools included with Python to parse a bit of html. > > There are lots of ways doing HTML parsing in Python. A comm

Re: HTML parsing confusion

2008-01-23 Thread cokofreedom
> The pages I'm trying to write this code to run against aren't in the > wild, though. They are static html files on my company's lan, are very > consistent in format, and are (I believe) valid html. Obvious way to check this is to go to http://validator.w3.org/ and see what it tells you about you

Re: HTML parsing confusion

2008-01-23 Thread M.-A. Lemburg
for tasks that require them, when the capability >> to do something clearly can't be done easily another way (eg. >> MySQLdb). I am sure that there will be plenty of times where I will >> use BeautifulSoup. In this instance, however, I was trying to solve a >&g

Re: HTML parsing confusion

2008-01-22 Thread Alnilam
On Jan 22, 7:29 pm, "Gabriel Genellina" <[EMAIL PROTECTED]> wrote: > > > I was asking this community if there was a simple way to use only the > > tools included with Python to parse a bit of html. > > If you *know* that your document is valid HTML, you can use the HTMLParser   > module in the stan

Re: HTML parsing confusion

2008-01-22 Thread [EMAIL PROTECTED]
On Jan 22, 7:29 pm, "Gabriel Genellina" <[EMAIL PROTECTED]> wrote: > > > I was asking this community if there was a simple way to use only the > > tools included with Python to parse a bit of html. > > If you *know* that your document is valid HTML, you can use the HTMLParser > module in the stand

Re: HTML parsing confusion

2008-01-22 Thread Gabriel Genellina
En Tue, 22 Jan 2008 19:20:32 -0200, Alnilam <[EMAIL PROTECTED]> escribió: > On Jan 22, 11:39 am, "Diez B. Roggisch" <[EMAIL PROTECTED]> wrote: >> Alnilam wrote: >> > On Jan 22, 8:44 am, Alnilam <[EMAIL PROTECTED]> wrote: >> >> > Pardon me, but the standard issue Python 2.n (for n in range(5, 2, >>

Re: HTML parsing confusion

2008-01-22 Thread Alnilam
On Jan 22, 11:39 am, "Diez B. Roggisch" <[EMAIL PROTECTED]> wrote: > Alnilam wrote: > > On Jan 22, 8:44 am, Alnilam <[EMAIL PROTECTED]> wrote: > >> > Pardon me, but the standard issue Python 2.n (for n in range(5, 2, > >> > -1)) doesn't have an xml.dom.ext ... you must have the mega-monstrous > >>

Re: HTML parsing confusion

2008-01-22 Thread Diez B. Roggisch
Alnilam wrote: > On Jan 22, 8:44 am, Alnilam <[EMAIL PROTECTED]> wrote: >> > Pardon me, but the standard issue Python 2.n (for n in range(5, 2, >> > -1)) doesn't have an xml.dom.ext ... you must have the mega-monstrous >> > 200-modules PyXML package installed. And you don't want the 75Kb >> > Beau

Re: HTML parsing confusion

2008-01-22 Thread Alnilam
On Jan 22, 8:44 am, Alnilam <[EMAIL PROTECTED]> wrote: > > Pardon me, but the standard issue Python 2.n (for n in range(5, 2, > > -1)) doesn't have an xml.dom.ext ... you must have the mega-monstrous > > 200-modules PyXML package installed. And you don't want the 75Kb > > BeautifulSoup? > > I wasn'

Re: HTML parsing confusion

2008-01-22 Thread Paul McGuire
On Jan 22, 7:44 am, Alnilam <[EMAIL PROTECTED]> wrote: > ...I move from computer to > computer regularly, and while all have a recent copy of Python, each > has different (or no) extra modules, and I don't always have the > luxury of downloading extras. That being said, if there's a simple way > of

Re: HTML parsing confusion

2008-01-22 Thread Alnilam
> Pardon me, but the standard issue Python 2.n (for n in range(5, 2, > -1)) doesn't have an xml.dom.ext ... you must have the mega-monstrous > 200-modules PyXML package installed. And you don't want the 75Kb > BeautifulSoup? I wasn't aware that I had PyXML installed, and can't find a reference to

Re: HTML parsing confusion

2008-01-22 Thread Paul Boddie
On 22 Jan, 06:31, Alnilam <[EMAIL PROTECTED]> wrote: > Sorry for the noob question, but I've gone through the documentation > on python.org, tried some of the diveintopython and boddie's examples, > and looked through some of the numerous posts in this group on the > subject and I'm still rather co

Re: HTML parsing confusion

2008-01-22 Thread John Machin
On Jan 22, 4:31 pm, Alnilam <[EMAIL PROTECTED]> wrote: > Sorry for the noob question, but I've gone through the documentation > on python.org, tried some of the diveintopython and boddie's examples, > and looked through some of the numerous posts in this group on the > subject and I'm still rather

HTML parsing confusion

2008-01-21 Thread Alnilam
Sorry for the noob question, but I've gone through the documentation on python.org, tried some of the diveintopython and boddie's examples, and looked through some of the numerous posts in this group on the subject and I'm still rather confused. I know that there are some great tools out there for

Re: How to Encode Parameters into an HTML Parsing Script

2007-06-22 Thread SMERSH009X
On Jun 21, 9:45 pm, "Gabriel Genellina" <[EMAIL PROTECTED]> wrote: > En Thu, 21 Jun 2007 23:37:07 -0300, <[EMAIL PROTECTED]> escribió: > > > So for example if I wanted to navigate to an encoded url > >http://online.investools.com/landing.iedu?signedin=truerather than > > justhttp://online.investool

Re: How to Encode Parameters into an HTML Parsing Script

2007-06-21 Thread Gabriel Genellina
En Thu, 21 Jun 2007 23:37:07 -0300, <[EMAIL PROTECTED]> escribió: > So for example if I wanted to navigate to an encoded url > http://online.investools.com/landing.iedu?signedin=true rather than > just http://online.investools.com/landing.iedu How would I do this? > How can I modify the script t

How to Encode Parameters into an HTML Parsing Script

2007-06-21 Thread SMERSH009X
I've written a Script that navigates various urls on a website, and fetches the contents. The Url's are being fed from a list "urlList". Everything seems to work splendidly, until I introduce the concept of encoding parameters for a certain url. So for example if I wanted to navigate to an encoded

Re: Output of HTML parsing

2007-06-19 Thread Stefan Behnel
Jackie schrieb: > On 6 15 , 2 01 , Stefan Behnel <[EMAIL PROTECTED]> wrote: >> Jackie wrote: > >> import lxml.etree as et >> url = "http://www.economics.utoronto.ca/index.php/index/person/faculty/"; >> tree = et.parse(url) >> > >> Stefan- - >> >> - - > > Thank you. But when I t

Re: Output of HTML parsing

2007-06-19 Thread Jackie
On 6 15 , 2 01 , Stefan Behnel <[EMAIL PROTECTED]> wrote: > Jackie wrote: > import lxml.etree as et > url = "http://www.economics.utoronto.ca/index.php/index/person/faculty/"; > tree = et.parse(url) > > Stefan- - > > - - Thank you. But when I tried to run the above part, the fo

Output of html parsing

2007-06-16 Thread Jackie Wang
Hi, all, I want to get the information of the professors (name,title) from the following link: "http://www.economics.utoronto.ca/index.php/index/person/faculty/"; Ideally, I'd like to have a output file where each line is one Prof, including his name and title. In practice, I us

Re: Output of HTML parsing

2007-06-15 Thread Stefan Behnel
Jackie wrote: > I want to get the information of the professors (name,title) from the > following link: > > "http://www.economics.utoronto.ca/index.php/index/person/faculty/"; That's even XHTML, no need to go through BeautifulSoup. Use lxml instead. http://codespeak.net/lxml > Ideally, I'd lik

Re: Output of HTML parsing

2007-06-15 Thread Sebastian Wiesner
[ Jackie <[EMAIL PROTECTED]> ] > 1.The code above assume that each Prof has a tilte. If any one of them > does not, the name and title will be mismatched. How to program to > allow that title can be empty? > > 2.Is there any easier way to get the data I want other than using > list? Use BeautifulS

Output of HTML parsing

2007-06-15 Thread Jackie
Hi, all, I want to get the information of the professors (name,title) from the following link: "http://www.economics.utoronto.ca/index.php/index/person/faculty/"; Ideally, I'd like to have a output file where each line is one Prof, including his name and title. In practice, I use the CSV module.

Re: HTML Parsing

2007-02-25 Thread Stefan Behnel
John Machin wrote: > One can even use ElementTree, if the HTML is well-formed. See below. > However if it is as ill-formed as the sample (4th "td" element not > closed; I've omitted it below), then the OP would be better off > sticking with Beautiful Soup :-) Or (as we were talking about the best

Re: HTML Parsing

2007-02-11 Thread Fredrik Lundh
John Machin wrote: > One can even use ElementTree, if the HTML is well-formed. See below. > However if it is as ill-formed as the sample (4th "td" element not > closed; I've omitted it below), then the OP would be better off > sticking with Beautiful Soup :-) or get the best of both worlds:

Re: HTML Parsing

2007-02-11 Thread John Machin
On Feb 11, 6:05 pm, Ayaz Ahmed Khan <[EMAIL PROTECTED]> wrote: > "mtuller" typed: > > > I have also tried Beautiful Soup, but had trouble understanding the > > documentation > > As Gabriel has suggested, spend a little more time going through the > documentation of BeautifulSoup. It is pretty easy

Re: HTML Parsing

2007-02-10 Thread Ayaz Ahmed Khan
"mtuller" typed: > I have also tried Beautiful Soup, but had trouble understanding the > documentation As Gabriel has suggested, spend a little more time going through the documentation of BeautifulSoup. It is pretty easy to grasp. I'll give you an example: I want to extract the text between the

Re: HTML Parsing

2007-02-10 Thread Gabriel Genellina
En Sat, 10 Feb 2007 20:07:43 -0300, mtuller <[EMAIL PROTECTED]> escribió: > > > LETTER > > 33,699 > > 1.0 > > > > I want to extract the 33,699 (which is dynamic) and set the value to a > variable so that I can insert it into a database. I have tried parsing > [...] > I have also tried Beau

HTML Parsing

2007-02-10 Thread mtuller
Alright. I have tried everything I can find, but am not getting anywhere. I have a web page that has data like this: LETTER 33,699 1.0 What is show is only a small section. I want to extract the 33,699 (which is dynamic) and set the value to a variable so that I can insert it into a databa

Re: HTML Parsing and Indexing

2006-11-16 Thread Paul McGuire
On Nov 13, 1:12 pm, [EMAIL PROTECTED] wrote: > > I need a help on HTML parser. > > > I saw a couple of python parsers like pyparsing, yappy, yapps, etc but > they havn't given any example for HTML parsing. Geez, how hard did you look? pyparsing's wiki menu includ

Re: HTML Parsing and Indexing

2006-11-13 Thread Stefan Behnel
[EMAIL PROTECTED] wrote: > I am involved in one project which tends to collect news > information published on selected, known web sites inthe format of > HTML, RSS, etc and sortlist them and create a bookmark on our website > for the news content(we will use django for web development). Curren

Re: HTML Parsing and Indexing

2006-11-13 Thread Andy Dingley
[EMAIL PROTECTED] wrote: > I am involved in one project which tends to collect news > information published on selected, known web sites inthe format of > HTML, RSS, etc I just can't imagine why anyone would still want to do this. With RSS, it's an easy (if not trivial) problem. With HTML

Re: HTML Parsing and Indexing

2006-11-13 Thread Bernard
te some small amount of code for each web site if > required. But Crawler, Parser and Indexer need to run unattended. I > don't know how to proceed next.. > > I saw a couple of python parsers like pyparsing, yappy, yapps, etc but > they havn't given any example for HTML parsin

Re: HTML Parsing and Indexing

2006-11-13 Thread Fredrik Lundh
[EMAIL PROTECTED] wrote: > I need a help on HTML parser. http://www.effbot.org/pyfaq/tutor-how-do-i-get-data-out-of-html.htm -- http://mail.python.org/mailman/listinfo/python-list

HTML Parsing and Indexing

2006-11-13 Thread mailtogops
given any example for HTML parsing. Someone recommended using "lynx" to convert the page into the text and parse the data. That also looks good but still i end of writing a huge chunk of code for each web page. What we need is, One nice parser which should work on HTML/text file (lynx outpu

Re: HTML parsing bug?

2006-02-02 Thread Istvan Albert
>> this is a comment in JavaScript, which is itself inside an HTML comment > Did you read the post? misread it rather ... -- http://mail.python.org/mailman/listinfo/python-list

Re: HTML parsing bug?

2006-02-01 Thread Fredrik Lundh
[EMAIL PROTECTED] wrote: > Python 2.3.5 seems to choke when trying to parse html files, because it > doesn't realize that what's inside is a comment in HTML, > even if this comment is inside , especially if it's a > comment inside that script code too. nope. what's inside is not a comment if

Re: HTML parsing bug?

2006-02-01 Thread Tim Roberts
"Istvan Albert" <[EMAIL PROTECTED]> wrote: > >> this is a comment in JavaScript, which is itself inside an HTML comment > >Don't nest HTML comments. Occasionaly it may break the browsers as >well. Did you read the post? He didn't nest HTML comments. He put a Javascript comment inside an HTML com

Re: HTML parsing bug?

2006-01-30 Thread Istvan Albert
> this is a comment in JavaScript, which is itself inside an HTML comment Don't nest HTML comments. Occasionaly it may break the browsers as well. (I remember this from one of the weirdest of bughunts : whenever the number of characters between nested HTML comments was divisible by four the page

Re: HTML parsing bug?

2006-01-30 Thread Richard Brodie
<[EMAIL PROTECTED]> wrote in message news:[EMAIL PROTECTED] > Python 2.3.5 seems to choke when trying to parse html files, because it > doesn't realize that what's inside is a comment in HTML, > even if this comment is inside , especially if it's a > comment inside that script code too. Actua

Re: HTML parsing bug?

2006-01-30 Thread G.
> // - this is a comment in JavaScript, which is itself inside > an HTML comment This is supposed to be one line. Got wrapped during posting. -- http://mail.python.org/mailman/listinfo/python-list

HTML parsing bug?

2006-01-30 Thread g_no_mail_please
Python 2.3.5 seems to choke when trying to parse html files, because it doesn't realize that what's inside is a comment in HTML, even if this comment is inside , especially if it's a comment inside that script code too. The html file: Choke on this