Re: [BangPypers] UnicodeDecodeError: 'utf8' codec can't decode byte xxx

2011-04-18 Thread Sam's Lists
Perhaps a silly question, but what encoding is the page you are getting? You can check this by loading the page in FireFox, going to the view menu, and selecting "character encoding". That will tell you what FireFox thinks is the encoding. If it's not UTF-8, you'll probably have to convert it. -

Re: [BangPypers] UnicodeDecodeError: 'utf8' codec can't decode byte xxx

2011-04-17 Thread Nikunj Badjatya
On Sun, Apr 17, 2011 at 11:17 PM, JAGANADH G wrote: > > > On Sun, Apr 17, 2011 at 11:13 PM, Nikunj Badjatya < > nikunjbadja...@gmail.com> wrote: > >> Hi, >> >> With stripogram Its working fine. >> Thanks a lot. :) !! >> >> But couldnt understand the reason behind the previous html2text >> malfunc

Re: [BangPypers] UnicodeDecodeError: 'utf8' codec can't decode byte xxx

2011-04-17 Thread JAGANADH G
On Sun, Apr 17, 2011 at 11:13 PM, Nikunj Badjatya wrote: > Hi, > > With stripogram Its working fine. > Thanks a lot. :) !! > > But couldnt understand the reason behind the previous html2text malfunction > for that particular (index1.htm) link.??! > > beacuse html2text encounters a problem with ut

Re: [BangPypers] UnicodeDecodeError: 'utf8' codec can't decode byte xxx

2011-04-17 Thread Nikunj Badjatya
Hi, With stripogram Its working fine. Thanks a lot. :) !! But couldnt understand the reason behind the previous html2text malfunction for that particular (index1.htm) link.??! On Sun, Apr 17, 2011 at 10:28 PM, JAGANADH G wrote: > >> > Hi > Do the following things > > install the python li

Re: [BangPypers] UnicodeDecodeError: 'utf8' codec can't decode byte xxx

2011-04-17 Thread Nikunj Badjatya
Dear Jaganadh, I have tried with separate individual execution as {{{ $ python html2text.py index1.htm Traceback (most recent call last): File "../aaronsw-html2text-d9bf7d6/html2text.py", line 488, in data = data.decode(encoding) File "/usr/lib/python2.6/encodings/utf_8.py", line 16, in

Re: [BangPypers] UnicodeDecodeError: 'utf8' codec can't decode byte xxx

2011-04-17 Thread JAGANADH G
On Sun, Apr 17, 2011 at 9:13 PM, Nikunj Badjatya wrote: > > Tried with the change. > {{{ > ... > ... > - myunistr = smart_str(fetch) > > + myunistr = smart_str(fetch.read()) > ... > ... > }}} > > Output: > > {{{ > Traceback (most recent call last): > File "html2text.py", line 447, in > dat

Re: [BangPypers] UnicodeDecodeError: 'utf8' codec can't decode byte xxx

2011-04-17 Thread Nikunj Badjatya
Tried with the change. {{{ ... ... - myunistr = smart_str(fetch) + myunistr = smart_str(fetch.read()) ... ... }}} Output: {{{ Traceback (most recent call last): File "html2text.py", line 447, in data = open(arg, 'r').read().decode(encoding) File "/usr/lib/python2.6/encodings/utf_8.py", l

Re: [BangPypers] UnicodeDecodeError: 'utf8' codec can't decode byte xxx

2011-04-17 Thread JAGANADH G
On Sun, Apr 17, 2011 at 8:43 PM, Nikunj Badjatya wrote: > Thanks for the quick reply.. > I hve never touched Django before. > > I tried as: > > {{{ > > #!/bin/python > > import os > import urllib > + from django.utils.encoding import smart_str > > fetch = urllib.urlopen("some-web-link.htm") > > ma

Re: [BangPypers] UnicodeDecodeError: 'utf8' codec can't decode byte xxx

2011-04-17 Thread Nikunj Badjatya
Thanks for the quick reply.. I hve never touched Django before. I tried as: {{{ #!/bin/python import os import urllib + from django.utils.encoding import smart_str fetch = urllib.urlopen("some-web-link.htm") mainfile = open ('main.html', 'w' ) + myunistr = smart_str(fetch) print myunistr ma

Re: [BangPypers] UnicodeDecodeError: 'utf8' codec can't decode byte xxx

2011-04-17 Thread JAGANADH G
On Sun, Apr 17, 2011 at 8:01 PM, Nikunj Badjatya wrote: > Hi All, > > I am working on a self project for grabbing certain URL's from the web. Do > some processing and store the final contents in text/pdf file. > > I am also using html2text ( > https://github.com/aaronsw/html2text/archives/master )