On Tue, 2007-10-30 at 07:26 +0000, Wiley wrote:
> Hi all,
> 
> I'm writing a doctest for a simple model and I can't get it to pass.
> 
> The problem seems to be that no matter how I enter the data into the
> test db, whether as a Unicode string or a utf-8 bytestring, and no
> matter what I tell my test to expect as a return value, the expected
> value is always rendered as a Unicode string (in this case a rendered
> Chinese character), whereas the value actually returned by the test is
> always a utf-8 bytestring.
> 
> I'm using the latest django revision (6628), os x 10.4.10, python 2.5,
> a postgres 8.2, my test database is set to: TEST_DATABASE_CHARSET =
> 'utf8' in settings.py, and I've verified that the test database is
> indeed UTF8 after its been created.  The models.py file has been saved
> in bbedit as a Unix file with UTF-8 encoding, and I put an explicit
> UTF-8 tag at the top of models.py for good measure.
> 
> Here are my specific questions:
> 
> 1.) Terminology: The expected result of the tests always seems to
> return '\xe4\xb8...' version of the chinese characters - this is the
> utf-8 bytestring, right?
> 
> 2.) Am I entering the data correctly?  I believe I correctly used both
> of the formats listed in the "Unicode data in Django" documentation
> (http://www.djangoproject.com/documentation/unicode/).  Is there a
> more correct way of entering the data?
> 
> 3.) Any ideas on how I could change this simple test to make it pass?

The bulk of your problems, I suspect, come back to the fact that is a
doctest. The problem is, at least partially, that Python parses the file
originally and sees everything inside the """..."""" docstring as text
and hence treats it as UTF-8 characters. So wrapping all those u'...'
markers around things doesn't always do what you expect.

What is happening in your case is that you are creating the models with
UTF-8 bytestrings, as a result of the entire docstring being encoded as
UTF-8, not Unicode. After calling 

        beijing = City.objects.create(...)
        
the 'beijing' object contains the data you initially assigned to the
attributes (UTF-8 bytestrings). Django doesn't reload the instance from
the database. 

I had a terrible time trying to get Unicode tests to work for Django's
core when I was writing them originally, because of this sort of
behaviour. There are also problems where reporting errors that involve
non-ASCII characters will often cause the doctest module to just
explode. So you know something went wrong, but not what. You may not
have hit that problem yet, but keep it in mind.

One solution is to reload the beijing and shanghai objects from the
database, so that you see what they *really* look like. You could do
something like:

        beijing = Cite.objects.get(pk=beijing.pk)
        
after the call to create(). This is actually a reasonable test, since,
in reality, you usually create an object somewhere in your code and only
later load it back to use it. If you're going to use an object straight
after creation, you do need to be aware that the attributes contain
exactly what you assigned to them, not what they would contain if you
reloaded it from the database (so bytestrings, as opposed to unicode in
this case).

Secondly ,I would suggest is to make your docstring a Unicode docstring.
So u"""...""" (note the initial 'u' prefix). Finally -- and this is the
one we use in Django's core tests in a lot of places -- is to enter your
non-ASCII data as UTF-8 and then convert it to unicode explicitly. So if
you explicitly want to assign Unicode data to the attributes and want to
ensure that the docstring encoding doesn't mess you up (or if, like me,
you get tired of debugging it hour after hour and just want to get some
work done instead of fighting problems in Python's library), you can
write:

        name_cn = '\xe4\xb8\x8a\xe6\xb5\xb7'.decode('utf-8')
        
(or use smart_unicode() or unicode(..., 'utf-8') or whatever your
favourite method might be).

In this case, I think you problem is simple: it's the fact that you are
using the attributes as you originally created them instead of what
would be reloaded from the database. But I thought I'd lay out all the
things you are going to discover as you go further here. Hopefully
somebody else with this problem will then find this in the archives.

Regards,
Malcolm

-- 
If at first you don't succeed, destroy all evidence that you tried. 
http://www.pointy-stick.com/blog/


--~--~---------~--~----~------------~-------~--~----~
You received this message because you are subscribed to the Google Groups 
"Django users" group.
To post to this group, send email to django-users@googlegroups.com
To unsubscribe from this group, send email to [EMAIL PROTECTED]
For more options, visit this group at 
http://groups.google.com/group/django-users?hl=en
-~----------~----~----~----~------~----~------~--~---

Reply via email to