Re: HTML Parser

2013-07-02 Thread Joshua Landau
On 2 July 2013 18:43, wrote: > I could not use BeautifulSoup as I did not find an .exe file. Were you perhaps looking for a .exe file to install BeautifulSoup? It's quite plausible that a windows user like you might be dazzled at the idea of a .tar.gz. I suggest just using "pip install beautifu

Re: HTML Parser

2013-07-02 Thread Steven D'Aprano
On Tue, 02 Jul 2013 10:43:03 -0700, subhabangalore wrote: > I could not use BeautifulSoup as I did not find an .exe file. I believe that BeautifulSoup is a pure-Python module, and so does not have a .exe file. However, it does have good tutorials: https://duckduckgo.com/html/?q=beautifulsoup+tu

Re: HTML Parser

2013-07-02 Thread Neil Cerutti
On 2013-07-02, subhabangal...@gmail.com wrote: > Dear Group, > > I was looking for a good tutorial for a "HTML Parser". My > intention was to extract tables from web pages or information > from tables in web pages. > > I tried to make a search, I got HTMLParser, BeautifulSoup, etc. > HTMLParser w

Re: HTML Parser which allows low-keyed local changes (upon serialization)

2010-02-01 Thread Tim Arnold
"Robert" wrote in message news:hk729b$na...@news.albasani.net... > Stefan Behnel wrote: >> Robert, 01.02.2010 14:36: >>> Stefan Behnel wrote: Robert, 31.01.2010 20:57: > I tried lxml, but after walking and making changes in the element > tree, > I'm forced to do a full serializ

Re: HTML Parser which allows low-keyed local changes?

2010-02-01 Thread Nobody
On Sun, 31 Jan 2010 20:57:31 +0100, Robert wrote: > I tried lxml, but after walking and making changes in the element > tree, I'm forced to do a full serialization of the whole document > (etree.tostring(tree)) - which destroys the "human edited" format > of the original HTML code. > makes it r

Re: HTML Parser which allows low-keyed local changes (upon serialization)

2010-02-01 Thread M.-A. Lemburg
Robert wrote: > I think you confused the logical level of what I meant with "file > position": > Of course its not about (necessarily) writing back to the same open file > (OS-level), but regarding the whole serializiation string (wherever it > is finally written to - I typically write the auto-con

Re: HTML Parser which allows low-keyed local changes (upon serialization)

2010-02-01 Thread Robert
Stefan Behnel wrote: Robert, 01.02.2010 14:36: Stefan Behnel wrote: Robert, 31.01.2010 20:57: I tried lxml, but after walking and making changes in the element tree, I'm forced to do a full serialization of the whole document (etree.tostring(tree)) - which destroys the "human edited" format of

Re: HTML Parser which allows low-keyed local changes (upon serialization)

2010-02-01 Thread Stefan Behnel
Robert, 01.02.2010 14:36: > Stefan Behnel wrote: >> Robert, 31.01.2010 20:57: >>> I tried lxml, but after walking and making changes in the element tree, >>> I'm forced to do a full serialization of the whole document >>> (etree.tostring(tree)) - which destroys the "human edited" format of the >>>

Re: HTML Parser which allows low-keyed local changes (upon serialization)

2010-02-01 Thread Robert
Robert wrote: Stefan Behnel wrote: Robert, 31.01.2010 20:57: I tried lxml, but after walking and making changes in the element tree, I'm forced to do a full serialization of the whole document (etree.tostring(tree)) - which destroys the "human edited" format of the original HTML code. makes it

Re: HTML Parser which allows low-keyed local changes (upon serialization)

2010-02-01 Thread Robert
Stefan Behnel wrote: Robert, 31.01.2010 20:57: I tried lxml, but after walking and making changes in the element tree, I'm forced to do a full serialization of the whole document (etree.tostring(tree)) - which destroys the "human edited" format of the original HTML code. makes it rather unreadab

Re: HTML Parser which allows low-keyed local changes?

2010-02-01 Thread Stefan Behnel
Robert, 31.01.2010 20:57: > I tried lxml, but after walking and making changes in the element tree, > I'm forced to do a full serialization of the whole document > (etree.tostring(tree)) - which destroys the "human edited" format of the > original HTML code. makes it rather unreadable. What do you

Re: HTML Parser for Jython

2007-10-17 Thread Falcolas
On Oct 17, 9:50 am, Carsten Haese <[EMAIL PROTECTED]> wrote: > Recent releases of BeautifulSoup need Python 2.3+, so they won't work on > current Jython, but BeatifulSoup 1.x will work. Thank you. -- http://mail.python.org/mailman/listinfo/python-list

Re: HTML Parser for Jython

2007-10-17 Thread Tim Chase
> Does anybody know of a decent HTML parser for Jython? I have to do > some screen scraping, and would rather use a tested module instead of > rolling my own. GIYF[0][1] There are the batteries-included HTMLParser[2] and htmllib[3] modules, and the ever-popular (and more developer-friendly) Beau

Re: HTML Parser for Jython

2007-10-17 Thread Carsten Haese
On Wed, 2007-10-17 at 17:36 +0200, Stefan Behnel wrote: > Falcolas wrote: > > Does anybody know of a decent HTML parser for Jython? I have to do > > some screen scraping, and would rather use a tested module instead of > > rolling my own. > > Not sure if it works, but have you tried BeautifulSoup?

Re: HTML Parser for Jython

2007-10-17 Thread Stefan Behnel
Falcolas wrote: > Does anybody know of a decent HTML parser for Jython? I have to do > some screen scraping, and would rather use a tested module instead of > rolling my own. Not sure if it works, but have you tried BeautifulSoup? Or maybe an older version of it? Stefan -- http://mail.python.org

Re: Html parser

2007-06-15 Thread Nikita the Spider
In article <[EMAIL PROTECTED]>, Stephen R Laniel <[EMAIL PROTECTED]> wrote: > On Fri, Jun 15, 2007 at 07:11:56AM -0700, HMS Surprise wrote: > > Could you recommend an html parser that works with python (jython > > 2.2)? > > I'm new here, but I believe BeautifulSoup is the canonical > answer: >

Re: Html parser

2007-06-15 Thread Lee Hinde
On Jun 15, 7:11 am, HMS Surprise <[EMAIL PROTECTED]> wrote: > Could you recommend an html parser that works with python (jython > 2.2)? HTMLParser does not seem to be in this library. To test some > of our browser based (mailnly php) code I seek for field names and > values associated with them.

Re: Html parser

2007-06-15 Thread HMS Surprise
Thanks, jh -- http://mail.python.org/mailman/listinfo/python-list

Re: Html parser

2007-06-15 Thread Stephen R Laniel
On Fri, Jun 15, 2007 at 07:11:56AM -0700, HMS Surprise wrote: > Could you recommend an html parser that works with python (jython > 2.2)? I'm new here, but I believe BeautifulSoup is the canonical answer: http://www.crummy.com/software/BeautifulSoup/ -- Stephen R. Laniel [EMAIL PROTECTED] Cell:

Re: HTML Parser in python

2007-04-06 Thread eknowles
Beautiful Soup. http://www.crummy.com/software/BeautifulSoup/ Works, well...beautifully. -- http://mail.python.org/mailman/listinfo/python-list

Re: HTML Parser in python

2007-04-06 Thread kyosohma
On Apr 6, 1:05 pm, "[EMAIL PROTECTED]" <[EMAIL PROTECTED]> wrote: > Hi, > > Is there a HTML parser (not xml) in python? > I need a html parser which has the ability to handle mal-format html > pages. > > Thank you. Yeah...it's called Beautiful Soup. http://www.crummy.com/software/BeautifulSoup/

Re: html parser , unexpected '<' char in declaration

2006-02-21 Thread Jesus Rivero (Neurogeek)
Oopss! You are totally right guys, i did miss the closing '>' thinking about maybe errors in the use of ' or ". Jesus Tim Roberts wrote: >"Jesus Rivero - (Neurogeek)" <[EMAIL PROTECTED]> wrote: > > >>hmmm, that's kind of different issue then. >> >>I can guess, from the error you pasted earlie

Re: html parser , unexpected '<' char in declaration

2006-02-21 Thread Sakcee
thanks for the suggestions, this is not happening frequently, actually this is the first time I have seen this exception in the system, which means that some spam message was generated with ill-formated html. i guess the best way would be to check using regular expression and delete the unclosed t

Re: html parser , unexpected '<' char in declaration

2006-02-21 Thread Tim Roberts
"Jesus Rivero - (Neurogeek)" <[EMAIL PROTECTED]> wrote: > >hmmm, that's kind of different issue then. > >I can guess, from the error you pasted earlier, that the problem shown >is due to the fact Python is interpreting a "<" as an expression and not >as a char. review your code or try to figure out

Re: html parser , unexpected '<' char in declaration

2006-02-20 Thread Jesus Rivero - (Neurogeek)
-BEGIN PGP SIGNED MESSAGE- Hash: SHA1 hmmm, that's kind of different issue then. I can guess, from the error you pasted earlier, that the problem shown is due to the fact Python is interpreting a "<" as an expression and not as a char. review your code or try to figure out the exact input

Re: html parser , unexpected '<' char in declaration

2006-02-20 Thread Sakcee
thanks for the reply well probabbly I should explain more. this is part of an email . after the mta delivers the email, it is stored in a local dir. After that the email is being parsed by the parser inside an web based imap client at display time. I dont think I have the choice of rewriting the

Re: html parser , unexpected '<' char in declaration

2006-02-20 Thread Jesus Rivero - (Neurogeek)
-BEGIN PGP SIGNED MESSAGE- Hash: SHA1 Sakcee wrote: > html = > ' \r\n Foo foo , blah blah > ' > > html = """ Foo foo , blah blah """ Try checking your html code. It looks really messy. '

Re: html parser?

2005-10-19 Thread leonardr
To extract links without the overhead of Beautiful Soup, one option is to copy what Beautiful Soup does, and write a SGMLParser subclass that only looks at 'a' tags. In general I think writing SGMLParser subclasses is a big pain (which is why I wrote Beautiful Soup), but since you only care about o

Re: html parser?

2005-10-19 Thread Laszlo Zsolt Nagy
Thorsten Kampe wrote: >* Christoph Söllner (2005-10-18 12:20 +0100) > > >>right, that's what I was looking for. Thanks very much. >> >> > >For simple things like that "BeautifulSoup" might be overkill. > >import formatter, \ > htmllib, \ > urllib > >url = 'http://python.org'

Re: html parser?

2005-10-18 Thread Paul Boddie
Thorsten Kampe wrote: > For simple things like that "BeautifulSoup" might be overkill. [HTMLParser example] I've used SGMLParser with some success before, although the SAX-style processing is objectionable to many people. One alternative is to use libxml2dom [1] and to parse documents as HTML: i

Re: html parser?

2005-10-18 Thread Thorsten Kampe
* Christoph Söllner (2005-10-18 12:20 +0100) > right, that's what I was looking for. Thanks very much. For simple things like that "BeautifulSoup" might be overkill. import formatter, \ htmllib, \ urllib url = 'http://python.org' htmlp = htmllib.HTMLParser(formatter.NullForm

Re: html parser?

2005-10-18 Thread Christoph S�llner
right, that's what I was looking for. Thanks very much. -- http://mail.python.org/mailman/listinfo/python-list

Re: html parser?

2005-10-18 Thread Laszlo Zsolt Nagy
Christoph Söllner wrote: >Hi *, > >is there a html parser available, which could i.e. extract all links from a >given text like that: >""" >BAR >BAR2 >""" > >and return a set of dicts like that: >""" >{ > ['foo.php','BAR','param1','test'], > ['foo2.php','BAR2','param1','test','param2','test'] >