Ivan Sagalaev <[EMAIL PROTECTED]> writes:

> David Abrahams wrote:
>> I've been running into a problem that seems very similar to
>> http://code.djangoproject.com/ticket/170, although I see that that
>> issue was fixed so I am betting the bug is on my end somewhere.
>> Unfortunately, I'm a little green w.r.t. unicode issues so I'm hoping
>> someone else can correct my misconceptions.
>> 
>> My app's templatetags/navigation.py file is enclosed.  If you look for
>> the string ".title()" you can see where the title() method on my Page
>> objects is getting called.  Unless I change that method to encode its
>> result as ascii or utf-8, I get the exception.  Can anyone explain
>> what's going on?  I suspect problems with mixing utf-8 and ascii
>> encoded strings, but I'm really out of my depth here.
>
> Looking at title() that you included:
>
>      def title(self):
>          return 'Home'
>
> it already returns an ASCII encoded string. But I suspect you actual 
> class returns a unicode, right?

Frankly I am not sure, although it does seem like a possibility.  I
could check.

> If yes then trying to concatenate a unicode string with a byte string 
> will force Python to decode a byte string into a unicode using whatever 
> current locale is active. This automatic decoding-encoding is always 
> error-prone because in different places you will have different locale. 
> So it's always needed to do this explicitly.

Oh, wow; I may well be doing that all over.  Is it possible to
instruct Python to make that an error rather than silently succeeding
to do something that I shouldn't do?

> Since most of the code in your template tag does its work using byte 
> strings it would be easier to encode title()'s output into a byte string 
> manually (an alternative would be converting all your tag's code to work 
> on unicode strings). 

Well, I did try that (or at least I thought so) but it didn't seem to
make the problem go away.

> The question is in what byte encoding to encode. It 
> looks obvious to convert it into settings.DEFAULT_CHARSET since it's an 
> encoding of all your output. However if you set DEFAULT_CHARSET into 
> some legacy encoding (i.e. other than 'utf-8') 

I don't set it to anything explicitly, so I have the default
DEFAULT_CHARSET.

> there might be cases (theoretically) when a unicode string contains
> characters that can't be encoded in it (for example you can't have
> russian characters in Western European windows-1252). So you may
> want to take a safety measure:
>
>      title().encode(settings.DEFAULT_CHARSET, errors='xmlcharrefreplace')
>
> ... and all characters that cannot be encoded into DEFAULT_CHARSET will 
> appear as for example &#1040; which is acceptable for HTML.

Well, that's very instructive, thank you!  But again, my main concern
is that I'm probably doing something bad all over, by combining byte
strings with unicode strings.

By the way, is there a reference that describes Python's string
abstractions?  I take it from your use of the terms "unicode string"
and "byte string" that they are not in fact congruent to one another.
Let me guess: a byte string (e.g. 'abcdef') is a semantically void
container of bytes that might be used for ascii, utf-8, or something
else... and a "unicode string" is a container of code points?

Thanks again for your attention,

-- 
Dave Abrahams
Boost Consulting
www.boost-consulting.com


--~--~---------~--~----~------------~-------~--~----~
 You received this message because you are subscribed to the Google Groups 
"Django users" group.
To post to this group, send email to django-users@googlegroups.com
To unsubscribe from this group, send email to [EMAIL PROTECTED]
For more options, visit this group at 
http://groups.google.com/group/django-users?hl=en
-~----------~----~----~----~------~----~------~--~---

Reply via email to