Problem from complex string messing up

2007-08-23 Thread sebzzz
Hi, I have these bunch of html files from which I've stripped presentation with BeautifulSoup (only kept a content div with the bare content). I've received a php template for the new site from the company we work with so I went on taking the same part of my first script that iterates through a g

Using Regular Expresions to change .htm to .php in files

2007-08-23 Thread sebzzz
Hi, I have a bunch of files that have changed from standard htm files to php files but all the links inside the site are now broken because they point to the .htm files while they are now .php files. Does anyone have an idea about how to do a simple script that changes each .htm in a given file t

Removing tags with BeautifulSoup

2007-08-08 Thread sebzzz
Hi, I'm in the process of cleaning some html files with BeautifulSoup and I want to remove all traces of the tables. Here is the bit of the code that deals with tables: def remove(soup, tagname): for tag in soup.findAll(tagname): contents = tag.contents parent = tag.parent

Re: Removing certain tags from html files

2007-07-27 Thread sebzzz
> > Than take a hold on the content and add it to the parent. Somthing like > this should work: > > from BeautifulSoup import BeautifulSoup > > def remove(soup, tagname): > for tag in soup.findAll(tagname): > contents = tag.contents > parent = tag.parent > tag.extract()

Removing certain tags from html files

2007-07-27 Thread sebzzz
Hi, I'm doing a little script with the help of the BeautifulSoup HTML parser and uTidyLib (HTML Tidy warper for python). Essentially what it does is fetch all the html files in a given directory (and it's subdirectories) clean the code with Tidy (removes deprecated tags, change the output to be x

Re: Right tool and method to strip off html files (python, sed, awk?)

2007-07-15 Thread sebzzz
Thank you guys for all the good advice. All be working on defining a clearer problem (I think this advice is good for all areas of life). I appreciate the help, the python community looks really open to learners and beginners, hope to be helping people myself in not too long from now (well, reaso

Right tool and method to strip off html files (python, sed, awk?)

2007-07-13 Thread sebzzz
Hi, I'm in the process of refactoring a lot of HTML documents and I'm using html tidy to do a part of this work. (clean up, change to xhtml and remove font and center tags) Now, Tidy will just do a part of the work I need to do, I have to remove all the presentational tags and attributes from the

Re: Parsing HTML, extracting text and changing attributes.

2007-06-18 Thread sebzzz
I see there is a couple of tools I could use, and I also heard of sgmllib and htmllib. So now there is lxml, Beautiful soup, sgmllib, htmllib ... Is there any of those tools that does the job I need to do more easily and what should I use? Maybe a combination of those tools, which one is better fo

Parsing HTML, extracting text and changing attributes.

2007-06-18 Thread sebzzz
Hi, I work at this company and we are re-building our website: http://caslt.org/. The new website will be built by an external firm (I could do it myself, but since I'm just the summer student worker...). Anyways, to help them, they first asked me to copy all the text from all the pages of the sit