Malcolm Tredinnick wrote:
> On Sun, 2008-01-06 at 15:25 -0600, Gary Wilson Jr. wrote:
>> It appears that at this point, response.content is a utf8-encoded bytestring.
>> I'm playing with a response middleware doing something like:
>>
>> MY_RE.sub(u'%s</body>' % text, response.content)
>>
>> which raises a UnicodeDecodeError if response.content contains non-ascii.
>>
>> I understand that the strings need to be of the same type, but was wondering
>> if response.content needs to be returned as a utf8-encoded bytestring or if
>> it's ok to convert it to unicode and return that.  Does it matter?
> 
> It must be UTF-8 (or, at least, a bytestring). Some encoding to be in
> force, since "unicode" isn't a character encoding and response.content
> is the last station before we send stuff back to the web server.

So to make sure I've got this right, would either of the two examples below be
sufficient?

content = MY_RE.sub(u'%s</body>' % text, force_unicode(response.content))
content = content.encode('utf-8')

content = MY_RE.sub((u'%s</body>' % text).encode('utf-8'), response.content)

> I realise this is slightly inconvenient for middleware classes, but
> since we cannot tell ahead of time if any middleware classes are going
> to be invoked, we have to treat response.content specially.

Could the handler not do the final encoding as the last thing it does on the
response's way out (so after any middleware has been processed)?

Gary

--~--~---------~--~----~------------~-------~--~----~
You received this message because you are subscribed to the Google Groups 
"Django users" group.
To post to this group, send email to django-users@googlegroups.com
To unsubscribe from this group, send email to [EMAIL PROTECTED]
For more options, visit this group at 
http://groups.google.com/group/django-users?hl=en
-~----------~----~----~----~------~----~------~--~---

Reply via email to