New submission from G. Scott Johnston:

I've come up with the following series of minimal examples to demonstrate my 
bug. 


>>> unicode("")
u''
>>> unicode("", errors="ignore")
u''


>>> unicode("abcü")
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
UnicodeDecodeError: 'ascii' codec can't decode byte 0xc3 in position 3: ordinal 
not in range(128)
>>> unicode("abcü", errors="ignore")
u'abc'


>>> unicode(3)
u'3'
>>> unicode(3, errors="ignore")
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
TypeError: coercing to Unicode: need string or buffer, int found


>>> unicode(unicode(""))
u''
>>> unicode(unicode(""), errors="ignore")
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
TypeError: decoding Unicode is not supported


The first two pairs of mini-programs are reasonable behaviour.  If the errors 
parameter is set to "ignore", no additional errors are thrown, but characters 
that produce encoding errors are skipped in the output, as expected.  

The third pair of mini-programs can be solved by instead writing 
unicode(str(3), errors="ignore").  This should likely be done automatically, 
given the fact that unicode(3) behaves as expected, and properly converts 
between types.  The fact that the conversion is done automatically without the 
errors parameter leads me to believe that there is a logic problem with the 
code, where the setting errors="ignore" changes the path of execution by more 
than just skipping characters that cause encoding errors.

The fourth pair of mini-programs is simply baffling.  The first mini-program 
clearly demonstrates that decoding a Unicode object is in fact supported.  The 
fact that the second mini-program claims it's not supported further 
demonstrates that the logic depends on the errors="ignore" parameter more than 
it should.

----------
messages: 196350
nosy: G..Scott.Johnston
priority: normal
severity: normal
status: open
title: Encoding a unicode with unicode() and ignoring errors
versions: Python 2.7

_______________________________________
Python tracker <rep...@bugs.python.org>
<http://bugs.python.org/issue18863>
_______________________________________
_______________________________________________
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com

Reply via email to