Malcom, Thanks _so_ much for this. I've learned a great deal from you both from your responses on this forum and by reading your blog. I will continue to work on this issue and post to the thread if I come up with anything of archival value.
Wiley On Oct 31, 10:33 am, Malcolm Tredinnick <[EMAIL PROTECTED]> wrote: > On Tue, 2007-10-30 at 07:26 +0000, Wiley wrote: > > Hi all, > > > I'm writing a doctest for a simple model and I can't get it to pass. > > > The problem seems to be that no matter how I enter the data into the > > test db, whether as a Unicode string or a utf-8 bytestring, and no > > matter what I tell my test to expect as a return value, the expected > > value is always rendered as a Unicode string (in this case a rendered > > Chinese character), whereas the value actually returned by the test is > > always a utf-8 bytestring. > > > I'm using the latest django revision (6628), os x 10.4.10, python 2.5, > > a postgres 8.2, my test database is set to: TEST_DATABASE_CHARSET = > > 'utf8' in settings.py, and I've verified that the test database is > > indeed UTF8 after its been created. The models.py file has been saved > > in bbedit as a Unix file with UTF-8 encoding, and I put an explicit > > UTF-8 tag at the top of models.py for good measure. > > > Here are my specific questions: > > > 1.) Terminology: The expected result of the tests always seems to > > return '\xe4\xb8...' version of the chinese characters - this is the > > utf-8 bytestring, right? > > > 2.) Am I entering the data correctly? I believe I correctly used both > > of the formats listed in the "Unicode data in Django" documentation > > (http://www.djangoproject.com/documentation/unicode/). Is there a > > more correct way of entering the data? > > > 3.) Any ideas on how I could change this simple test to make it pass? > > The bulk of your problems, I suspect, come back to the fact that is a > doctest. The problem is, at least partially, that Python parses the file > originally and sees everything inside the """..."""" docstring as text > and hence treats it as UTF-8 characters. So wrapping all those u'...' > markers around things doesn't always do what you expect. > > What is happening in your case is that you are creating the models with > UTF-8 bytestrings, as a result of the entire docstring being encoded as > UTF-8, not Unicode. After calling > > beijing = City.objects.create(...) > > the 'beijing' object contains the data you initially assigned to the > attributes (UTF-8 bytestrings). Django doesn't reload the instance from > the database. > > I had a terrible time trying to get Unicode tests to work for Django's > core when I was writing them originally, because of this sort of > behaviour. There are also problems where reporting errors that involve > non-ASCII characters will often cause the doctest module to just > explode. So you know something went wrong, but not what. You may not > have hit that problem yet, but keep it in mind. > > One solution is to reload the beijing and shanghai objects from the > database, so that you see what they *really* look like. You could do > something like: > > beijing = Cite.objects.get(pk=beijing.pk) > > after the call to create(). This is actually a reasonable test, since, > in reality, you usually create an object somewhere in your code and only > later load it back to use it. If you're going to use an object straight > after creation, you do need to be aware that the attributes contain > exactly what you assigned to them, not what they would contain if you > reloaded it from the database (so bytestrings, as opposed to unicode in > this case). > > Secondly ,I would suggest is to make your docstring a Unicode docstring. > So u"""...""" (note the initial 'u' prefix). Finally -- and this is the > one we use in Django's core tests in a lot of places -- is to enter your > non-ASCII data as UTF-8 and then convert it to unicode explicitly. So if > you explicitly want to assign Unicode data to the attributes and want to > ensure that the docstring encoding doesn't mess you up (or if, like me, > you get tired of debugging it hour after hour and just want to get some > work done instead of fighting problems in Python's library), you can > write: > > name_cn = '\xe4\xb8\x8a\xe6\xb5\xb7'.decode('utf-8') > > (or use smart_unicode() or unicode(..., 'utf-8') or whatever your > favourite method might be). > > In this case, I think you problem is simple: it's the fact that you are > using the attributes as you originally created them instead of what > would be reloaded from the database. But I thought I'd lay out all the > things you are going to discover as you go further here. Hopefully > somebody else with this problem will then find this in the archives. > > Regards, > Malcolm > > -- > If at first you don't succeed, destroy all evidence that you > tried.http://www.pointy-stick.com/blog/ --~--~---------~--~----~------------~-------~--~----~ You received this message because you are subscribed to the Google Groups "Django users" group. To post to this group, send email to django-users@googlegroups.com To unsubscribe from this group, send email to [EMAIL PROTECTED] For more options, visit this group at http://groups.google.com/group/django-users?hl=en -~----------~----~----~----~------~----~------~--~---