Oh, my description of (5) is not totally correct: u'%r' % bytestring_value
is fine because repr(non_ascii_bytestring) is an escaped 7bit ascii; this
mean HttpResponseBase._convert_to_charset is almost fine (bytes would be of
incorrect encoding, but this won't raise an exception). The argument about
"%r" % object_with_non_ascii__repr__ still apply.
пятница, 28 декабря 2012 г., 1:20:24 UTC+6 пользователь Mikhail Korobov
написал:
>
> Hi there,
>
> First of all, many kudos for the Python 3.x support in upcoming django
> 1.5, and for the way it is handled (the approach, the docs, etc)!
>
> I think there are some pitfalls with
> @python_2_unicode_compatible decorator as it currently implemented in
> django (and __str__/__repr__ in general), and want to share the thoughts
> before the 1.5 release. I'm sorry that this message is pretty vague; it
> points to some problems with the current approach (some of them are real,
> some would occur very rarely) but it doesn't propose the solution for
> django other than "please review the code once more".
>
> 1) @python_2_unicode_compatible doesn't handle __repr__.
> For example, this affects django.db.models.options.Options,
> django.core.files.base.File (and ContentFile),
> django.contrib.admin.models.LogEntry, django.template.base.Variable
> and probably many others (their __repr__ incorrectly returns unicode).
>
> It also may be the cause why django.db.models.Model.__repr__ doesn't
> follow Python conventions ("__repr__ should be information-rich and
> unambiguous" - unicode values are replaced with "[Bad Unicode data]").
> By the way, the way django detects whether value needs replacing
> is not correct and doesn't prevent all errors because what
> "u = six.text_type(self)" do for bytestring is decode data using
> sys.getdefaultencoding() while repr is (most?) often used in console,
> where sys.stdout.encoding matters.
>
> 2) under Python 2.x __str__ is implemented as __unicode__
> encoded to utf8. This breaks 'print django_obj' when sys.stdout.encoding
> is not utf8 because print uses __str__ (not __unicode__) for custom
> objects,
> and the terminal expects the result to be encoded in sys.stdout.encoding
> (print encodes unicode strings to sys.stdout.encoding, but doesn't
> use __unicode__ of objects; this is hard-coded in Python 2.x).
> This may affect REPL in Windows consoles and printing/writing to stdout
> in management commands.
>
> 3) @python_2_unicode_compatible produces incorrect results
> when applied twice (__str__ is patched by previous decorator
> application
> and returns bytestring because of that).
> This is easy to oversight e.g. when applying this decorator to a
> subclass of a class which is wrapped to @python_2_unicode_compatible
> and deleting the overridden __str__ afterwards.
>
> 4) __str__ is not always properly implemented for this decorator in django
> code. To work properly with @python_2_unicode_compatible,
> __str__ must return unicode string. This is quite subtle.
> For example, take a look at django.contrib.gis.maps.google.GEvent.
> __str__ is implemented as
> "return mark_safe('"%s", %s' %(self.event, self.action))",
> but "from __future__ import unicode_literals" is not applied to the
> file.
> This means that if event and action are Python objects with both __str__
> and __unicode__ methods defined (e.g. object of class wrapped with
> python_2_unicode_compatible) then __str__ would be called for these
> objects,
> not __unicode__ (because the format string is a bytestring). Generally,
> "%s" % something is a good and correct pattern for __str__
> implementation
> (it does the right thing under both Python 2.x and 3.x when
> unicode_literals future import is there), but it is incorrect under
> Python
> 2.x if unicode_literals is not imported.
>
> 5) %r is very tricky. If unicode_literals are in effect, or some
> arguments for string formatting are unicode,
> "%r" % obj would trigger bytes decoding using sys.getdefaultencoding()
> under
> Python 2.x (unless obj is an unicode string), and if obj.__repr__
> returns
> non-ascii text or obj is a bytestring, exception would be raised
> (because sys.getdefaultencoding() is usually ascii).
> This format specifier is used, for example, in a default_error_messages
> for django.db.models.fields.Field; after switching to unicode_literals
> this may start raising UnicodeDecodeExceptions for non-ascii choices
> if they are custom objects (not unicode strings).
> Another example is
> django.http.response.HttpResponseBase._convert_to_charset
> where BadHeaderError exception is raised: after switching to
> unicode_literals
> %r format specifier start triggering decoding of "value" using
> sys.getdefaultencoding()
> which is incorrect because "value" is a bytestring of 'charset'
> encoding under
> Python 2.x. Another example is django.utils.datastructures.SortedDict:
> its __repr__ uses '%r: %r' % (k, v) for k, v in six.iteritems(self)
> which may fail if key is an unicode string and a value is a bytestring
> or an object with __repr__ returning non-ascii text. Another example
> is django.utils.encoding.DjangoUnicodeDecodeError
> (it has incorrect __str__ by the way because it returns unicode) -
> it uses "%r" for self.obj, with unicode string formatter,
> and this would blow up if __repr__ of obj returns non-ascii text.
> There are other places where %r is used and they all are fragile.
>
> I've implemented an another python_2_unicode_compatible decorator
> (inspired by django's, the idea is cool) for NLTK:
> https://github.com/nltk/nltk/blob/2and3/nltk/compat.py#L122 which
> resolves some of issues above (it handles __repr__, limits __str__ and
> __repr__ to ascii and supports subclassing better). The article (rather
> lengthy, with some django bashing :) that provides motivation for the
> decorator used in NLTK: http://kmike.ru/python-with-strings-attached/(the
> code in the article is a bit outdated, it is not the code used in
> NLTK; NLTK version was improved, but I didn't update the article yet).
>
--
You received this message because you are subscribed to the Google Groups
"Django developers" group.
To view this discussion on the web visit
https://groups.google.com/d/msg/django-developers/-/2ajESIItEVoJ.
To post to this group, send email to [email protected].
To unsubscribe from this group, send email to
[email protected].
For more options, visit this group at
http://groups.google.com/group/django-developers?hl=en.