Re: Parsing HTML, extracting text and changing attributes.

2007-06-18 Thread Stefan Behnel
[EMAIL PROTECTED] wrote: > I see there is a couple of tools I could use, and I also heard of > sgmllib and htmllib. So now there is lxml, Beautiful soup, sgmllib, > htmllib ... > > Is there any of those tools that does the job I need to do more easily > and what should I use? Maybe a combination o

Re: Parsing HTML, extracting text and changing attributes.

2007-06-18 Thread sebzzz
I see there is a couple of tools I could use, and I also heard of sgmllib and htmllib. So now there is lxml, Beautiful soup, sgmllib, htmllib ... Is there any of those tools that does the job I need to do more easily and what should I use? Maybe a combination of those tools, which one is better fo

Re: Parsing HTML, extracting text and changing attributes.

2007-06-18 Thread Jay Loden
Stefan Behnel wrote: > Jay Loden wrote: >> Someone else mentioned lxml but as I understand it lxml will only work if >> it's valid XHTML that they're working with. > > No, it was meant as the OP requested. It even has a very good parser from > broken HTML. > > http://codespeak.net/lxml/dev/parsi

Re: Parsing HTML, extracting text and changing attributes.

2007-06-18 Thread Jay Loden
Stefan Behnel wrote: > Jay Loden wrote: >> Someone else mentioned lxml but as I understand it lxml will only work if >> it's valid XHTML that they're working with. > > No, it was meant as the OP requested. It even has a very good parser from > broken HTML. > > http://codespeak.net/lxml/dev/parsi

Re: Parsing HTML, extracting text and changing attributes.

2007-06-18 Thread Stefan Behnel
Jay Loden wrote: > Someone else mentioned lxml but as I understand it lxml will only work if > it's valid XHTML that they're working with. No, it was meant as the OP requested. It even has a very good parser from broken HTML. http://codespeak.net/lxml/dev/parsing.html#parsing-html Stefan -- htt

Re: Parsing HTML, extracting text and changing attributes.

2007-06-18 Thread Jay Loden
Neil Cerutti wrote: > You could get good results, and save yourself some effort, using > links or lynx with the command line options to dump page text to > a file. Python would still be needed to automate calling links or > lynx on all your documents. OP was looking for a way to parse out part of

Re: Parsing HTML, extracting text and changing attributes.

2007-06-18 Thread Neil Cerutti
On 2007-06-18, [EMAIL PROTECTED] <[EMAIL PROTECTED]> wrote: > I work at this company and we are re-building our website: http://caslt.org/. > The new website will be built by an external firm (I could do it > myself, but since I'm just the summer student worker...). Anyways, to > help them, they fi

Parsing HTML, extracting text and changing attributes.

2007-06-18 Thread sebzzz
Hi, I work at this company and we are re-building our website: http://caslt.org/. The new website will be built by an external firm (I could do it myself, but since I'm just the summer student worker...). Anyways, to help them, they first asked me to copy all the text from all the pages of the sit

Re: Parsing HTML, extracting text and changing attributes.

2007-06-18 Thread Stefan Behnel
[EMAIL PROTECTED] wrote: > I work at this company and we are re-building our website: http://caslt.org/. > The new website will be built by an external firm (I could do it > myself, but since I'm just the summer student worker...). Anyways, to > help them, they first asked me to copy all the text f

Re: Parsing HTML, extracting text and changing attributes.

2007-06-18 Thread Rob Wolfe
[EMAIL PROTECTED] wrote: > So, I'm writing this to have your opinion on what tools I should use > to do this and what technique I should use. Take a look at parsing example on this page: http://wiki.python.org/moin/SimplePrograms -- HTH, Rob -- http://mail.python.org/mailman/listinfo/python-l