Thanks for the quick reply.. I hve never touched Django before. I tried as: {{{
#!/bin/python import os import urllib + from django.utils.encoding import smart_str fetch = urllib.urlopen("some-web-link.htm") mainfile = open ('main.html', 'w' ) + myunistr = smart_str(fetch) print myunistr mainfile.write(myunistr) os.system('python2.6 html2text.py main.html > main.txt') }}} The execution went fine without any issues. But when I open the "main.html". I was expecting it to havee full contents of the page . But it has only , {{{ <addinfourl at 148983116 whose fp = <socket._fileobject object at 0x8deabac>> }}} Please let me know if I am missing something. Thanks, Nikunj On Sun, Apr 17, 2011 at 8:11 PM, JAGANADH G <jagana...@gmail.com> wrote: > On Sun, Apr 17, 2011 at 8:01 PM, Nikunj Badjatya > <nikunjbadja...@gmail.com>wrote: > > > Hi All, > > > > I am working on a self project for grabbing certain URL's from the web. > Do > > some processing and store the final contents in text/pdf file. > > > > I am also using html2text ( > > https://github.com/aaronsw/html2text/archives/master ) for converting > the > > fetched page into text format. > > As a first step I tried with fetching and converting to text using > > following > > code. > > > > Code : > > {{{ > > #!/bin/python > > > > import os > > import urllib > > > > fetch = urllib.urlopen("some-web-link.htm") > > > > mainfile = open ('main.html', 'w' ) > > > > mainfile.write(fetch.read()) > > > > os.system('python2.6 html2text.py main.html > main.txt') > > > > }}} > > > > It flags an error: > > {{{ > > Traceback (most recent call last): > > File "html2text.py", line 447, in <module> > > data = open(arg, 'r').read().decode(encoding) > > File "/usr/lib/python2.6/encodings/utf_8.py", line 16, in decode > > return codecs.utf_8_decode(input, errors, True) > > UnicodeDecodeError: 'utf8' codec can't decode byte 0x88 in position > 11366: > > invalid start byte > > > > }}} > > > > I also tried with > > {{{ > > + import codecs > > > > ... > > ... > > - mainfile = open ('main.html', 'w' ) > > +mainfile = codecs.open('xyz.htm', 'w', None, 'ignore') > > > > ... > > ... > > }}} > > > > Result is coming the same. > > > > Please tell as to what can be done to avoid this error.? > > > > > > > Try this > > from django.utils.encoding import smart_str > > myunistr = smart_str(YOUR_STRING) > > This will solve the issue > > > > -- > ********************************** > JAGANADH G > http://jaganadhg.freeflux.net/blog > *ILUGCBE* > http://ilugcbe.techstud.org > _______________________________________________ > BangPypers mailing list > BangPypers@python.org > http://mail.python.org/mailman/listinfo/bangpypers > _______________________________________________ BangPypers mailing list BangPypers@python.org http://mail.python.org/mailman/listinfo/bangpypers