[EMAIL PROTECTED] wrote:
> I see there is a couple of tools I could use, and I also heard of
> sgmllib and htmllib. So now there is lxml, Beautiful soup, sgmllib,
> htmllib ...
>
> Is there any of those tools that does the job I need to do more easily
> and what should I use? Maybe a combination o
I see there is a couple of tools I could use, and I also heard of
sgmllib and htmllib. So now there is lxml, Beautiful soup, sgmllib,
htmllib ...
Is there any of those tools that does the job I need to do more easily
and what should I use? Maybe a combination of those tools, which one
is better fo
Stefan Behnel wrote:
> Jay Loden wrote:
>> Someone else mentioned lxml but as I understand it lxml will only work if
>> it's valid XHTML that they're working with.
>
> No, it was meant as the OP requested. It even has a very good parser from
> broken HTML.
>
> http://codespeak.net/lxml/dev/parsi
Stefan Behnel wrote:
> Jay Loden wrote:
>> Someone else mentioned lxml but as I understand it lxml will only work if
>> it's valid XHTML that they're working with.
>
> No, it was meant as the OP requested. It even has a very good parser from
> broken HTML.
>
> http://codespeak.net/lxml/dev/parsi
Jay Loden wrote:
> Someone else mentioned lxml but as I understand it lxml will only work if
> it's valid XHTML that they're working with.
No, it was meant as the OP requested. It even has a very good parser from
broken HTML.
http://codespeak.net/lxml/dev/parsing.html#parsing-html
Stefan
--
htt
Neil Cerutti wrote:
> You could get good results, and save yourself some effort, using
> links or lynx with the command line options to dump page text to
> a file. Python would still be needed to automate calling links or
> lynx on all your documents.
OP was looking for a way to parse out part of
On 2007-06-18, [EMAIL PROTECTED] <[EMAIL PROTECTED]> wrote:
> I work at this company and we are re-building our website: http://caslt.org/.
> The new website will be built by an external firm (I could do it
> myself, but since I'm just the summer student worker...). Anyways, to
> help them, they fi
Hi,
I work at this company and we are re-building our website: http://caslt.org/.
The new website will be built by an external firm (I could do it
myself, but since I'm just the summer student worker...). Anyways, to
help them, they first asked me to copy all the text from all the pages
of the sit
[EMAIL PROTECTED] wrote:
> I work at this company and we are re-building our website: http://caslt.org/.
> The new website will be built by an external firm (I could do it
> myself, but since I'm just the summer student worker...). Anyways, to
> help them, they first asked me to copy all the text f
[EMAIL PROTECTED] wrote:
> So, I'm writing this to have your opinion on what tools I should use
> to do this and what technique I should use.
Take a look at parsing example on this page:
http://wiki.python.org/moin/SimplePrograms
--
HTH,
Rob
--
http://mail.python.org/mailman/listinfo/python-l
10 matches
Mail list logo