On Tue, 2007-10-30 at 07:26 +0000, Wiley wrote: > Hi all, > > I'm writing a doctest for a simple model and I can't get it to pass. > > The problem seems to be that no matter how I enter the data into the > test db, whether as a Unicode string or a utf-8 bytestring, and no > matter what I tell my test to expect as a return value, the expected > value is always rendered as a Unicode string (in this case a rendered > Chinese character), whereas the value actually returned by the test is > always a utf-8 bytestring. > > I'm using the latest django revision (6628), os x 10.4.10, python 2.5, > a postgres 8.2, my test database is set to: TEST_DATABASE_CHARSET = > 'utf8' in settings.py, and I've verified that the test database is > indeed UTF8 after its been created. The models.py file has been saved > in bbedit as a Unix file with UTF-8 encoding, and I put an explicit > UTF-8 tag at the top of models.py for good measure. > > Here are my specific questions: > > 1.) Terminology: The expected result of the tests always seems to > return '\xe4\xb8...' version of the chinese characters - this is the > utf-8 bytestring, right? > > 2.) Am I entering the data correctly? I believe I correctly used both > of the formats listed in the "Unicode data in Django" documentation > (http://www.djangoproject.com/documentation/unicode/). Is there a > more correct way of entering the data? > > 3.) Any ideas on how I could change this simple test to make it pass?
The bulk of your problems, I suspect, come back to the fact that is a doctest. The problem is, at least partially, that Python parses the file originally and sees everything inside the """..."""" docstring as text and hence treats it as UTF-8 characters. So wrapping all those u'...' markers around things doesn't always do what you expect. What is happening in your case is that you are creating the models with UTF-8 bytestrings, as a result of the entire docstring being encoded as UTF-8, not Unicode. After calling beijing = City.objects.create(...) the 'beijing' object contains the data you initially assigned to the attributes (UTF-8 bytestrings). Django doesn't reload the instance from the database. I had a terrible time trying to get Unicode tests to work for Django's core when I was writing them originally, because of this sort of behaviour. There are also problems where reporting errors that involve non-ASCII characters will often cause the doctest module to just explode. So you know something went wrong, but not what. You may not have hit that problem yet, but keep it in mind. One solution is to reload the beijing and shanghai objects from the database, so that you see what they *really* look like. You could do something like: beijing = Cite.objects.get(pk=beijing.pk) after the call to create(). This is actually a reasonable test, since, in reality, you usually create an object somewhere in your code and only later load it back to use it. If you're going to use an object straight after creation, you do need to be aware that the attributes contain exactly what you assigned to them, not what they would contain if you reloaded it from the database (so bytestrings, as opposed to unicode in this case). Secondly ,I would suggest is to make your docstring a Unicode docstring. So u"""...""" (note the initial 'u' prefix). Finally -- and this is the one we use in Django's core tests in a lot of places -- is to enter your non-ASCII data as UTF-8 and then convert it to unicode explicitly. So if you explicitly want to assign Unicode data to the attributes and want to ensure that the docstring encoding doesn't mess you up (or if, like me, you get tired of debugging it hour after hour and just want to get some work done instead of fighting problems in Python's library), you can write: name_cn = '\xe4\xb8\x8a\xe6\xb5\xb7'.decode('utf-8') (or use smart_unicode() or unicode(..., 'utf-8') or whatever your favourite method might be). In this case, I think you problem is simple: it's the fact that you are using the attributes as you originally created them instead of what would be reloaded from the database. But I thought I'd lay out all the things you are going to discover as you go further here. Hopefully somebody else with this problem will then find this in the archives. Regards, Malcolm -- If at first you don't succeed, destroy all evidence that you tried. http://www.pointy-stick.com/blog/ --~--~---------~--~----~------------~-------~--~----~ You received this message because you are subscribed to the Google Groups "Django users" group. To post to this group, send email to django-users@googlegroups.com To unsubscribe from this group, send email to [EMAIL PROTECTED] For more options, visit this group at http://groups.google.com/group/django-users?hl=en -~----------~----~----~----~------~----~------~--~---