Re: simple model with won't pass doctest that passes in a unicode string

Wiley Tue, 30 Oct 2007 20:36:04 -0800

Malcom,

Thanks _so_ much for this.  I've learned a great deal from you both
from your responses on this forum and by reading your blog.  I will
continue to work on this issue and post to the thread if I come up
with
anything of archival value.


Wiley

On Oct 31, 10:33 am, Malcolm Tredinnick <[EMAIL PROTECTED]>
wrote:
> On Tue, 2007-10-30 at 07:26 +0000, Wiley wrote:
> > Hi all,
>
> > I'm writing a doctest for a simple model and I can't get it to pass.
>
> > The problem seems to be that no matter how I enter the data into the
> > test db, whether as a Unicode string or a utf-8 bytestring, and no
> > matter what I tell my test to expect as a return value, the expected
> > value is always rendered as a Unicode string (in this case a rendered
> > Chinese character), whereas the value actually returned by the test is
> > always a utf-8 bytestring.
>
> > I'm using the latest django revision (6628), os x 10.4.10, python 2.5,
> > a postgres 8.2, my test database is set to: TEST_DATABASE_CHARSET =
> > 'utf8' in settings.py, and I've verified that the test database is
> > indeed UTF8 after its been created.  The models.py file has been saved
> > in bbedit as a Unix file with UTF-8 encoding, and I put an explicit
> > UTF-8 tag at the top of models.py for good measure.
>
> > Here are my specific questions:
>
> > 1.) Terminology: The expected result of the tests always seems to
> > return '\xe4\xb8...' version of the chinese characters - this is the
> > utf-8 bytestring, right?
>
> > 2.) Am I entering the data correctly?  I believe I correctly used both
> > of the formats listed in the "Unicode data in Django" documentation
> > (http://www.djangoproject.com/documentation/unicode/).  Is there a
> > more correct way of entering the data?
>
> > 3.) Any ideas on how I could change this simple test to make it pass?
>
> The bulk of your problems, I suspect, come back to the fact that is a
> doctest. The problem is, at least partially, that Python parses the file
> originally and sees everything inside the """..."""" docstring as text
> and hence treats it as UTF-8 characters. So wrapping all those u'...'
> markers around things doesn't always do what you expect.
>
> What is happening in your case is that you are creating the models with
> UTF-8 bytestrings, as a result of the entire docstring being encoded as
> UTF-8, not Unicode. After calling
>
>         beijing = City.objects.create(...)
>
> the 'beijing' object contains the data you initially assigned to the
> attributes (UTF-8 bytestrings). Django doesn't reload the instance from
> the database.
>
> I had a terrible time trying to get Unicode tests to work for Django's
> core when I was writing them originally, because of this sort of
> behaviour. There are also problems where reporting errors that involve
> non-ASCII characters will often cause the doctest module to just
> explode. So you know something went wrong, but not what. You may not
> have hit that problem yet, but keep it in mind.
>
> One solution is to reload the beijing and shanghai objects from the
> database, so that you see what they *really* look like. You could do
> something like:
>
>         beijing = Cite.objects.get(pk=beijing.pk)
>
> after the call to create(). This is actually a reasonable test, since,
> in reality, you usually create an object somewhere in your code and only
> later load it back to use it. If you're going to use an object straight
> after creation, you do need to be aware that the attributes contain
> exactly what you assigned to them, not what they would contain if you
> reloaded it from the database (so bytestrings, as opposed to unicode in
> this case).
>
> Secondly ,I would suggest is to make your docstring a Unicode docstring.
> So u"""...""" (note the initial 'u' prefix). Finally -- and this is the
> one we use in Django's core tests in a lot of places -- is to enter your
> non-ASCII data as UTF-8 and then convert it to unicode explicitly. So if
> you explicitly want to assign Unicode data to the attributes and want to
> ensure that the docstring encoding doesn't mess you up (or if, like me,
> you get tired of debugging it hour after hour and just want to get some
> work done instead of fighting problems in Python's library), you can
> write:
>
>         name_cn = '\xe4\xb8\x8a\xe6\xb5\xb7'.decode('utf-8')
>
> (or use smart_unicode() or unicode(..., 'utf-8') or whatever your
> favourite method might be).
>
> In this case, I think you problem is simple: it's the fact that you are
> using the attributes as you originally created them instead of what
> would be reloaded from the database. But I thought I'd lay out all the
> things you are going to discover as you go further here. Hopefully
> somebody else with this problem will then find this in the archives.
>
> Regards,
> Malcolm
>
> --
> If at first you don't succeed, destroy all evidence that you 
> tried.http://www.pointy-stick.com/blog/


--~--~---------~--~----~------------~-------~--~----~
You received this message because you are subscribed to the Google Groups 
"Django users" group.
To post to this group, send email to django-users@googlegroups.com
To unsubscribe from this group, send email to [EMAIL PROTECTED]
For more options, visit this group at 
http://groups.google.com/group/django-users?hl=en
-~----------~----~----~----~------~----~------~--~---

Re: simple model with won't pass doctest that passes in a unicode string

Reply via email to