On 2 July 2013 18:43, wrote:
> I could not use BeautifulSoup as I did not find an .exe file.
Were you perhaps looking for a .exe file to install BeautifulSoup?
It's quite plausible that a windows user like you might be dazzled at
the idea of a .tar.gz.
I suggest just using "pip install beautifu
On Tue, 02 Jul 2013 10:43:03 -0700, subhabangalore wrote:
> I could not use BeautifulSoup as I did not find an .exe file.
I believe that BeautifulSoup is a pure-Python module, and so does not
have a .exe file. However, it does have good tutorials:
https://duckduckgo.com/html/?q=beautifulsoup+tu
On 2013-07-02, subhabangal...@gmail.com wrote:
> Dear Group,
>
> I was looking for a good tutorial for a "HTML Parser". My
> intention was to extract tables from web pages or information
> from tables in web pages.
>
> I tried to make a search, I got HTMLParser, BeautifulSoup, etc.
> HTMLParser w
"Robert" wrote in message
news:hk729b$na...@news.albasani.net...
> Stefan Behnel wrote:
>> Robert, 01.02.2010 14:36:
>>> Stefan Behnel wrote:
Robert, 31.01.2010 20:57:
> I tried lxml, but after walking and making changes in the element
> tree,
> I'm forced to do a full serializ
On Sun, 31 Jan 2010 20:57:31 +0100, Robert wrote:
> I tried lxml, but after walking and making changes in the element
> tree, I'm forced to do a full serialization of the whole document
> (etree.tostring(tree)) - which destroys the "human edited" format
> of the original HTML code.
> makes it r
Robert wrote:
> I think you confused the logical level of what I meant with "file
> position":
> Of course its not about (necessarily) writing back to the same open file
> (OS-level), but regarding the whole serializiation string (wherever it
> is finally written to - I typically write the auto-con
Stefan Behnel wrote:
Robert, 01.02.2010 14:36:
Stefan Behnel wrote:
Robert, 31.01.2010 20:57:
I tried lxml, but after walking and making changes in the element tree,
I'm forced to do a full serialization of the whole document
(etree.tostring(tree)) - which destroys the "human edited" format of
Robert, 01.02.2010 14:36:
> Stefan Behnel wrote:
>> Robert, 31.01.2010 20:57:
>>> I tried lxml, but after walking and making changes in the element tree,
>>> I'm forced to do a full serialization of the whole document
>>> (etree.tostring(tree)) - which destroys the "human edited" format of the
>>>
Robert wrote:
Stefan Behnel wrote:
Robert, 31.01.2010 20:57:
I tried lxml, but after walking and making changes in the element tree,
I'm forced to do a full serialization of the whole document
(etree.tostring(tree)) - which destroys the "human edited" format of the
original HTML code. makes it
Stefan Behnel wrote:
Robert, 31.01.2010 20:57:
I tried lxml, but after walking and making changes in the element tree,
I'm forced to do a full serialization of the whole document
(etree.tostring(tree)) - which destroys the "human edited" format of the
original HTML code. makes it rather unreadab
Robert, 31.01.2010 20:57:
> I tried lxml, but after walking and making changes in the element tree,
> I'm forced to do a full serialization of the whole document
> (etree.tostring(tree)) - which destroys the "human edited" format of the
> original HTML code. makes it rather unreadable.
What do you
On Oct 17, 9:50 am, Carsten Haese <[EMAIL PROTECTED]> wrote:
> Recent releases of BeautifulSoup need Python 2.3+, so they won't work on
> current Jython, but BeatifulSoup 1.x will work.
Thank you.
--
http://mail.python.org/mailman/listinfo/python-list
> Does anybody know of a decent HTML parser for Jython? I have to do
> some screen scraping, and would rather use a tested module instead of
> rolling my own.
GIYF[0][1]
There are the batteries-included HTMLParser[2] and htmllib[3]
modules, and the ever-popular (and more developer-friendly)
Beau
On Wed, 2007-10-17 at 17:36 +0200, Stefan Behnel wrote:
> Falcolas wrote:
> > Does anybody know of a decent HTML parser for Jython? I have to do
> > some screen scraping, and would rather use a tested module instead of
> > rolling my own.
>
> Not sure if it works, but have you tried BeautifulSoup?
Falcolas wrote:
> Does anybody know of a decent HTML parser for Jython? I have to do
> some screen scraping, and would rather use a tested module instead of
> rolling my own.
Not sure if it works, but have you tried BeautifulSoup? Or maybe an older
version of it?
Stefan
--
http://mail.python.org
In article <[EMAIL PROTECTED]>,
Stephen R Laniel <[EMAIL PROTECTED]> wrote:
> On Fri, Jun 15, 2007 at 07:11:56AM -0700, HMS Surprise wrote:
> > Could you recommend an html parser that works with python (jython
> > 2.2)?
>
> I'm new here, but I believe BeautifulSoup is the canonical
> answer:
>
On Jun 15, 7:11 am, HMS Surprise <[EMAIL PROTECTED]> wrote:
> Could you recommend an html parser that works with python (jython
> 2.2)? HTMLParser does not seem to be in this library. To test some
> of our browser based (mailnly php) code I seek for field names and
> values associated with them.
Thanks,
jh
--
http://mail.python.org/mailman/listinfo/python-list
On Fri, Jun 15, 2007 at 07:11:56AM -0700, HMS Surprise wrote:
> Could you recommend an html parser that works with python (jython
> 2.2)?
I'm new here, but I believe BeautifulSoup is the canonical
answer:
http://www.crummy.com/software/BeautifulSoup/
--
Stephen R. Laniel
[EMAIL PROTECTED]
Cell:
Beautiful Soup. http://www.crummy.com/software/BeautifulSoup/
Works, well...beautifully.
--
http://mail.python.org/mailman/listinfo/python-list
On Apr 6, 1:05 pm, "[EMAIL PROTECTED]" <[EMAIL PROTECTED]> wrote:
> Hi,
>
> Is there a HTML parser (not xml) in python?
> I need a html parser which has the ability to handle mal-format html
> pages.
>
> Thank you.
Yeah...it's called Beautiful Soup.
http://www.crummy.com/software/BeautifulSoup/
Oopss!
You are totally right guys, i did miss the closing '>' thinking about
maybe errors in the use of ' or ".
Jesus
Tim Roberts wrote:
>"Jesus Rivero - (Neurogeek)" <[EMAIL PROTECTED]> wrote:
>
>
>>hmmm, that's kind of different issue then.
>>
>>I can guess, from the error you pasted earlie
thanks for the suggestions,
this is not happening frequently, actually this is the first time I
have seen this exception in the system, which means that some spam
message was generated with ill-formated html.
i guess the best way would be to check using regular expression and
delete the unclosed t
"Jesus Rivero - (Neurogeek)" <[EMAIL PROTECTED]> wrote:
>
>hmmm, that's kind of different issue then.
>
>I can guess, from the error you pasted earlier, that the problem shown
>is due to the fact Python is interpreting a "<" as an expression and not
>as a char. review your code or try to figure out
-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1
hmmm, that's kind of different issue then.
I can guess, from the error you pasted earlier, that the problem shown
is due to the fact Python is interpreting a "<" as an expression and not
as a char. review your code or try to figure out the exact input
thanks for the reply
well probabbly I should explain more. this is part of an email . after
the mta delivers the email, it is stored in a local dir.
After that the email is being parsed by the parser inside an web based
imap client at display time.
I dont think I have the choice of rewriting the
-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1
Sakcee wrote:
> html =
> ' \r\n Foo foo , blah blah
> '
>
>
html =
"""
Foo foo , blah blah
"""
Try checking your html code. It looks really messy. '
To extract links without the overhead of Beautiful Soup, one option is
to copy what Beautiful Soup does, and write a SGMLParser subclass that
only looks at 'a' tags. In general I think writing SGMLParser
subclasses is a big pain (which is why I wrote Beautiful Soup), but
since you only care about o
Thorsten Kampe wrote:
>* Christoph Söllner (2005-10-18 12:20 +0100)
>
>
>>right, that's what I was looking for. Thanks very much.
>>
>>
>
>For simple things like that "BeautifulSoup" might be overkill.
>
>import formatter, \
> htmllib, \
> urllib
>
>url = 'http://python.org'
Thorsten Kampe wrote:
> For simple things like that "BeautifulSoup" might be overkill.
[HTMLParser example]
I've used SGMLParser with some success before, although the SAX-style
processing is objectionable to many people. One alternative is to use
libxml2dom [1] and to parse documents as HTML:
i
* Christoph Söllner (2005-10-18 12:20 +0100)
> right, that's what I was looking for. Thanks very much.
For simple things like that "BeautifulSoup" might be overkill.
import formatter, \
htmllib, \
urllib
url = 'http://python.org'
htmlp = htmllib.HTMLParser(formatter.NullForm
right, that's what I was looking for. Thanks very much.
--
http://mail.python.org/mailman/listinfo/python-list
Christoph Söllner wrote:
>Hi *,
>
>is there a html parser available, which could i.e. extract all links from a
>given text like that:
>"""
>BAR
>BAR2
>"""
>
>and return a set of dicts like that:
>"""
>{
> ['foo.php','BAR','param1','test'],
> ['foo2.php','BAR2','param1','test','param2','test']
>
33 matches
Mail list logo