On Wed, Dec 4, 2013 at 9:32 AM, Hans-Peter Jansen <h...@urpla.net> wrote: > I'm experiencing strange behavior with attached code, that differs depending > on sys.setdefaultencoding being set or not. If it is set, the code works as > expected, if not - what should be the usual case - the code fails with some > non-sensible traceback.
Interesting. You're mixing str and unicode objects a lot here. The cleanest solution, IMO, would be to either switch to Python 3 or add this to the top of your code: from __future__ import unicode_literals Either way, you'll have all your quoted strings be Unicode, rather than byte, strings. Then take away the requirement that Unicode strings contain non-ASCII characters, and let everything go through that code branch. Looking at this line in reprstr(): s = "u'%s'" % s.replace("'", "\\'") Two potential problems with that. Firstly, the representation is flawed: a backslash in the input string won't be changed, so it's not a true repr; but if this is just for debugging output, that's not a big deal. Secondly, this code might produce either a str or a unicode, depending on the type of s. That may cause messes later; since you seem to be mostly working with the unicode type after that, it'd probably be simpler/safer to make that always return one: s = u"u'%s'" % s.replace("'", "\\'") But the actual problem, I think, is that repr() guarantees to return a str, and you're trying to return a unicode. Here's an illustration: # -*- coding: utf-8 -*- class Foo(object): def __repr__(self): return u'äöü' foo = Foo() print(foo.__repr__()) print(repr(foo)) The first one succeeds, because building up that string isn't at all a problem. The second one then tries to turn the return value of __repr__ into a string using the default encoding - which defaults to 'ascii', hence the problem you're seeing. Solution 1: Switch to Python 3, in which this will work fine (because repr() in Py3 returns a Unicode string, since _everything_ is Unicode). Solution 2: Explicitly encode in frec, or at the end of Record.__repr__(): def __repr__(self): s = u'%s(\n%s\n)' % (self.__class__.__name__, frec(self.__dict__)) return s.encode("utf-8") (that could be a one-liner, but it's already pushing 80-chars, so if you have a length limit, breaking it helps) Solution 3: Don't use __repr__ here, but simply have your frec function intelligently handle Record types. Effectively, you have your own method of generating a debug description of a Record, which could then return a unicode instead of a str. I personally recommend switching to Python 3 :) But presumably that's not an option, or you'd already have considered it. ChrisA -- https://mail.python.org/mailman/listinfo/python-list