Here's a small script to generate again the error
running windows 7 with python 3.1

FILE : parseShift.py

import urllib.request as url
from html.parser import HTMLParser

class myParser(HTMLParser):
        def handle_starttag(self, tag, attrs):
                print("Start of %s tag : %s" % (tag, attrs))
                

test = myParser()               
handle = url.urlretrieve("http://localhost/shift.html";)
handleTemp = open( handle[0] , encoding="Shift-JIS" )
test.feed( handleTemp.read() )
handleTempl.close()

FILE : shift.html (encoded Shift-JIS)

<p class="thisisclass (not_in_japanese) reading_this_should_be_ok">Some random japanese <p><strong>東方プロジェクト</strong> <a href="#" title="キャプテン・ムラ サ">Link</a>

OUTPUT

Start of p tag : [('class', 'thisisclass (not_in_japanese) reading_this_should_be_ok')]
Start of p tag : []
Start of strong tag : []
Traceback (most recent call last):
  File "D:\Dorian\Python\parseShift.py", line 12, in <module>
    test.feed( handleTemp.read() )
  File "C:\Python31\lib\html\parser.py", line 108, in feed
    self.goahead(0)
  File "C:\Python31\lib\html\parser.py", line 148, in goahead
    k = self.parse_starttag(i)
  File "C:\Python31\lib\html\parser.py", line 268, in parse_starttag
    self.handle_starttag(tag, attrs)
  File "D:\Dorian\Python\parseShift.py", line 6, in handle_starttag
    print("Start of %s tag : %s" % (tag, attrs))
  File "C:\Python31\lib\encodings\cp1252.py", line 19, in encode
    return codecs.charmap_encode(input,self.errors,encoding_table)[0]
UnicodeEncodeError: 'charmap' codec can't encode characters in position 44-52: c
haracter maps to <undefined>


any help?
Dorian
--
http://mail.python.org/mailman/listinfo/python-list

Reply via email to