Re: Parsing HTML, extracting text and changing attributes.

2007-06-18 Thread Stefan Behnel
[EMAIL PROTECTED] wrote: > I see there is a couple of tools I could use, and I also heard of > sgmllib and htmllib. So now there is lxml, Beautiful soup, sgmllib, > htmllib ... > > Is there any of those tools that does the job I need to do more easily > and what should I use? Maybe a combination o

Re: Parsing HTML, extracting text and changing attributes.

2007-06-18 Thread sebzzz
I see there is a couple of tools I could use, and I also heard of sgmllib and htmllib. So now there is lxml, Beautiful soup, sgmllib, htmllib ... Is there any of those tools that does the job I need to do more easily and what should I use? Maybe a combination of those tools, which one is better fo

Re: Parsing HTML, extracting text and changing attributes.

2007-06-18 Thread Jay Loden
Stefan Behnel wrote: > Jay Loden wrote: >> Someone else mentioned lxml but as I understand it lxml will only work if >> it's valid XHTML that they're working with. > > No, it was meant as the OP requested. It even has a very good parser from > broken HTML. > > http://codespeak.net/lxml/dev/parsi

Re: Parsing HTML, extracting text and changing attributes.

2007-06-18 Thread Jay Loden
Stefan Behnel wrote: > Jay Loden wrote: >> Someone else mentioned lxml but as I understand it lxml will only work if >> it's valid XHTML that they're working with. > > No, it was meant as the OP requested. It even has a very good parser from > broken HTML. > > http://codespeak.net/lxml/dev/parsi

Re: Parsing HTML, extracting text and changing attributes.

2007-06-18 Thread Stefan Behnel
Jay Loden wrote: > Someone else mentioned lxml but as I understand it lxml will only work if > it's valid XHTML that they're working with. No, it was meant as the OP requested. It even has a very good parser from broken HTML. http://codespeak.net/lxml/dev/parsing.html#parsing-html Stefan -- htt

Re: Parsing HTML, extracting text and changing attributes.

2007-06-18 Thread Jay Loden
Neil Cerutti wrote: > You could get good results, and save yourself some effort, using > links or lynx with the command line options to dump page text to > a file. Python would still be needed to automate calling links or > lynx on all your documents. OP was looking for a way to parse out part of

Re: Parsing HTML, extracting text and changing attributes.

2007-06-18 Thread Neil Cerutti
On 2007-06-18, [EMAIL PROTECTED] <[EMAIL PROTECTED]> wrote: > I work at this company and we are re-building our website: http://caslt.org/. > The new website will be built by an external firm (I could do it > myself, but since I'm just the summer student worker...). Anyways, to > help them, they fi

Re: Parsing HTML, extracting text and changing attributes.

2007-06-18 Thread Stefan Behnel
[EMAIL PROTECTED] wrote: > I work at this company and we are re-building our website: http://caslt.org/. > The new website will be built by an external firm (I could do it > myself, but since I'm just the summer student worker...). Anyways, to > help them, they first asked me to copy all the text f

Re: Parsing HTML, extracting text and changing attributes.

2007-06-18 Thread Rob Wolfe
[EMAIL PROTECTED] wrote: > So, I'm writing this to have your opinion on what tools I should use > to do this and what technique I should use. Take a look at parsing example on this page: http://wiki.python.org/moin/SimplePrograms -- HTH, Rob -- http://mail.python.org/mailman/listinfo/python-l