Hi,
I have these bunch of html files from which I've stripped presentation
with BeautifulSoup (only kept a content div with the bare content).
I've received a php template for the new site from the company we work
with so I went on taking the same part of my first script that
iterates through a g
Hi,
I have a bunch of files that have changed from standard htm files to
php files but all the links inside the site are now broken because
they point to the .htm files while they are now .php files.
Does anyone have an idea about how to do a simple script that changes
each .htm in a given file t
Hi,
I'm in the process of cleaning some html files with BeautifulSoup and
I want to remove all traces of the tables. Here is the bit of the code
that deals with tables:
def remove(soup, tagname):
for tag in soup.findAll(tagname):
contents = tag.contents
parent = tag.parent
>
> Than take a hold on the content and add it to the parent. Somthing like
> this should work:
>
> from BeautifulSoup import BeautifulSoup
>
> def remove(soup, tagname):
> for tag in soup.findAll(tagname):
> contents = tag.contents
> parent = tag.parent
> tag.extract()
Hi,
I'm doing a little script with the help of the BeautifulSoup HTML
parser and uTidyLib (HTML Tidy warper for python).
Essentially what it does is fetch all the html files in a given
directory (and it's subdirectories) clean the code with Tidy (removes
deprecated tags, change the output to be x
Thank you guys for all the good advice.
All be working on defining a clearer problem (I think this advice is
good for all areas of life).
I appreciate the help, the python community looks really open to
learners and beginners, hope to be helping people myself in not too
long from now (well, reaso
Hi,
I'm in the process of refactoring a lot of HTML documents and I'm
using html tidy to do a part of this
work. (clean up, change to xhtml and remove font and center tags)
Now, Tidy will just do a part of the work I need to
do, I have to remove all the presentational tags and attributes from
the
I see there is a couple of tools I could use, and I also heard of
sgmllib and htmllib. So now there is lxml, Beautiful soup, sgmllib,
htmllib ...
Is there any of those tools that does the job I need to do more easily
and what should I use? Maybe a combination of those tools, which one
is better fo
Hi,
I work at this company and we are re-building our website: http://caslt.org/.
The new website will be built by an external firm (I could do it
myself, but since I'm just the summer student worker...). Anyways, to
help them, they first asked me to copy all the text from all the pages
of the sit