thanks guys, I tried this.. from BeautifulSoup import BeautifulSoup import urllib
def get_page_body_text(url): h=urllib.urlopen(url) data=h.read() soup=BeautifulSoup(data) body_texts = soup.body(text=True) text = ''.join(body_texts) return text ... while True: #print 'size=%d'%len(get_page_body_text('http:// www.google.com')) print 'size=%d'%len(get_page_body_text('http:// sampleblogbyjim.blogspot.com/')) time.sleep(5) when google.com is the url ,the code gets the correct length of data.Then I tried a blog which I created for fun, This causes the code to crash with an error File "/usr/lib/python2.6/HTMLParser.py", line 115, in error raise HTMLParseError(message, self.getpos()) HTMLParser.HTMLParseError: bad end tag: u"</scr' + 'ipt>", Any idea how this can be taken care of?The blog site must be creating bad html..How do you deal with such a problem? thanks jim -- You received this message because you are subscribed to the Google Groups "Django users" group. To post to this group, send email to django-us...@googlegroups.com. To unsubscribe from this group, send email to django-users+unsubscr...@googlegroups.com. For more options, visit this group at http://groups.google.com/group/django-users?hl=en.