I'm working on a program to remove tags from a HTML document, leaving
just the content, but I want to do it simply. I've finished a system
to remove simple tags, but I want all CSS and JS to be removed. What
re pattern could I use to do that?
I've tried
''
but that didn't work properly. I'm fai
I haven't personally used freeze (Kubuntu doesn't seem to install it
with the python debs), but based on what I know of it, it makes make
files. I'm not a make expert, but if FreeBSD has GNU tools, freeze's
output _should_ be able to be compiled on FreeBSD.
On Dec 15, 5:52 am, robert <[EMAIL PROT
I'm working on a program to remove tags from a HTML document, leaving
just the content, but I want to do it simply. I've finished a system
to remove simple tags, but I want all CSS and JS to be removed. What
re pattern could I use to do that?
I've tried
''
but that didn't work properly. I'm fai
Thank you! Fixed my problem perfectly!
Gabriel Genellina wrote:
> At Thursday 9/11/2006 20:23, i80and wrote:
>
> >I'm working on a basic web spider, and I'm having problems with the
> >urlparser.
> >[...]
> > SpliceStart = Website.find('&
I'm working on a basic web spider, and I'm having problems with the
urlparser.
This is the effected function:
--
def FindLinks(Website):
WebsiteLen = len(Website)+1
CurrentLink = ''
i = 0
SpliceStart = 0
SpliceEnd = 0
I would suggest using string.replace. Simply replace ' ' with ' '
for each time it occurs. It doesn't take too much code.
On Nov 7, 1:34 pm, "mp" <[EMAIL PROTECTED]> wrote:
> I have html document titles with characters like >, , and
> ‡. How do I decode a string with these values in Python?
>