[EMAIL PROTECTED] wrote:
> I see there is a couple of tools I could use, and I also heard of
> sgmllib and htmllib. So now there is lxml, Beautiful soup, sgmllib,
> htmllib ...
>
> Is there any of those tools that does the job I need to do more easily
> and what should I use? Maybe a combination o
I see there is a couple of tools I could use, and I also heard of
sgmllib and htmllib. So now there is lxml, Beautiful soup, sgmllib,
htmllib ...
Is there any of those tools that does the job I need to do more easily
and what should I use? Maybe a combination of those tools, which one
is better fo
Stefan Behnel wrote:
> Jay Loden wrote:
>> Someone else mentioned lxml but as I understand it lxml will only work if
>> it's valid XHTML that they're working with.
>
> No, it was meant as the OP requested. It even has a very good parser from
> broken HTML.
>
> http://codespeak.net/lxml/dev/parsi
Stefan Behnel wrote:
> Jay Loden wrote:
>> Someone else mentioned lxml but as I understand it lxml will only work if
>> it's valid XHTML that they're working with.
>
> No, it was meant as the OP requested. It even has a very good parser from
> broken HTML.
>
> http://codespeak.net/lxml/dev/parsi
Jay Loden wrote:
> Someone else mentioned lxml but as I understand it lxml will only work if
> it's valid XHTML that they're working with.
No, it was meant as the OP requested. It even has a very good parser from
broken HTML.
http://codespeak.net/lxml/dev/parsing.html#parsing-html
Stefan
--
htt
Neil Cerutti wrote:
> You could get good results, and save yourself some effort, using
> links or lynx with the command line options to dump page text to
> a file. Python would still be needed to automate calling links or
> lynx on all your documents.
OP was looking for a way to parse out part of
On 2007-06-18, [EMAIL PROTECTED] <[EMAIL PROTECTED]> wrote:
> I work at this company and we are re-building our website: http://caslt.org/.
> The new website will be built by an external firm (I could do it
> myself, but since I'm just the summer student worker...). Anyways, to
> help them, they fi
[EMAIL PROTECTED] wrote:
> I work at this company and we are re-building our website: http://caslt.org/.
> The new website will be built by an external firm (I could do it
> myself, but since I'm just the summer student worker...). Anyways, to
> help them, they first asked me to copy all the text f
[EMAIL PROTECTED] wrote:
> So, I'm writing this to have your opinion on what tools I should use
> to do this and what technique I should use.
Take a look at parsing example on this page:
http://wiki.python.org/moin/SimplePrograms
--
HTH,
Rob
--
http://mail.python.org/mailman/listinfo/python-l