I've dealt with this by manipulating the data with Python codecs.

import codecs

new_value= codecs.decode(current_value, 'utf-8', 'ignore')

the default option is 'strict' which will raise a ValueError that
you've experienced, 'ignore' will drop the offending character, and
'replace' allows you to replace the malformed data with a suitable
replacement marker.

sometimes I use a try/except like this

try:
    new_value= u"%s" % current_value
except:
    import codecs
    new_value= codecs.decode(current_value, 'utf-8', 'ignore')


On Nov 17, 9:14 am, Tom Evans <tevans...@googlemail.com> wrote:
> Hi all
>
> I'm encountering a difficult to solve unicode problem whilst saving data to
> the database. Worst of all, any attempt to reduce it to a simple test case,
> or reproduce it in the console fail(!). This is on django 1.0.
>
> The process encountering the error is a simple daemon, run from a management
> command [1]. The process looks up a task [2] to run and executes it. After
> the task has finished executing, it updates the generated_content member on
> the model, either to contain any pertinent error messages if there was a
> failure, or to store rendered HTML if the task was successful.
>
> The problem occurs when the generated HTML contains particular unicode
> characters (in this case, right single quotation mark, \u2019), which for
> some reason prompts django or MySQLdb to decide to convert it to unicode.
> The unicode HTML comes from rendering a django template; here's the snippet
> that generates the HTML:
>
>       cdict = { ... } # left out; template renders correctly, so not
> important..
>       ctxt = Context(cdict)
>       from django.template import loader
>       content = loader.render_to_string('the_template.html',
> context_instance=ctxt)
>       self.task.generated_content = content
>
> This code is called from MigrationTask::execute() - this is in the (working)
> PerformMigration class - and is the last thing that happens before we call
> save() on the modified instance. Apart from the generated_content, the only
> other thing that changes on this model as a result of this code is the
> status attribute.
>
> When we do call save(), the following traceback is produced:
>
> Traceback (most recent call last):
>   File
> "/usr/local/www/django/ssosp/externals/identity_provider/tasks/management/c­ommands/taskrunner.py",
> line 44, in handle
>     task.execute()
>   File
> "/usr/local/www/django/ssosp/externals/identity_provider/tasks/models.py",
> line 39, in execute
>     self.save()
>   File
> "/usr/local/www/django/ssosp/root/lib/python2.5/site-packages/django/db/mod­els/base.py",
> line 307, in save
>     self.save_base(force_insert=force_insert, force_update=force_update)
>   File
> "/usr/local/www/django/ssosp/root/lib/python2.5/site-packages/django/db/mod­els/base.py",
> line 358, in save_base
>     rows = manager.filter(pk=pk_val)._update(values)
>   File
> "/usr/local/www/django/ssosp/root/lib/python2.5/site-packages/django/db/mod­els/query.py",
> line 429, in _update
>     return query.execute_sql(None)
>   File
> "/usr/local/www/django/ssosp/root/lib/python2.5/site-packages/django/db/mod­els/sql/subqueries.py",
> line 117, in execute_sql
>     cursor = super(UpdateQuery, self).execute_sql(result_type)
>   File
> "/usr/local/www/django/ssosp/root/lib/python2.5/site-packages/django/db/mod­els/sql/query.py",
> line 1700, in execute_sql
>     cursor.execute(sql, params)
>   File
> "/usr/local/www/django/ssosp/root/lib/python2.5/site-packages/django/db/bac­kends/mysql/base.py",
> line 83, in execute
>     return self.cursor.execute(query, args)
>   File "/usr/local/lib/python2.5/site-packages/MySQLdb/cursors.py", line
> 151, in execute
>     query = query % db.literal(args)
>   File "/usr/local/lib/python2.5/site-packages/MySQLdb/connections.py", line
> 247, in literal
>     return self.escape(o, self.encoders)
>   File "/usr/local/lib/python2.5/site-packages/MySQLdb/connections.py", line
> 180, in string_literal
>     return db.string_literal(obj)
> UnicodeEncodeError: 'ascii' codec can't encode character u'\u2019' in
> position 1182: ordinal not in range(128)
>
> If I set a break point where we generate the content, print out
> repr(content), copy paste that into a django python shell and assign it to a
> task's generated_content property, it saves correctly.
>
> If I manually change content to u'\u2019' inside the debugger, it also saves
> correctly. It also works correctly for u'\u2019'*2048, just in case size of
> string matters.
>
> The database and all tables are set to UTF-8 in mysql. My locale is
> correctly set up in both cases (en_GB.UTF-8). I'm very confused as to why it
> is attempting to convert it to ascii :/
>
> Any hints/tips greatly appreciated.
>
> Cheers
>
> Tom
>
> [1]http://pastebin.com/m9e23563
> [2]http://pastebin.com/m564e1cd7

--

You received this message because you are subscribed to the Google Groups 
"Django users" group.
To post to this group, send email to django-us...@googlegroups.com.
To unsubscribe from this group, send email to 
django-users+unsubscr...@googlegroups.com.
For more options, visit this group at 
http://groups.google.com/group/django-users?hl=.


Reply via email to