I've dealt with this by manipulating the data with Python codecs. import codecs
new_value= codecs.decode(current_value, 'utf-8', 'ignore') the default option is 'strict' which will raise a ValueError that you've experienced, 'ignore' will drop the offending character, and 'replace' allows you to replace the malformed data with a suitable replacement marker. sometimes I use a try/except like this try: new_value= u"%s" % current_value except: import codecs new_value= codecs.decode(current_value, 'utf-8', 'ignore') On Nov 17, 9:14 am, Tom Evans <tevans...@googlemail.com> wrote: > Hi all > > I'm encountering a difficult to solve unicode problem whilst saving data to > the database. Worst of all, any attempt to reduce it to a simple test case, > or reproduce it in the console fail(!). This is on django 1.0. > > The process encountering the error is a simple daemon, run from a management > command [1]. The process looks up a task [2] to run and executes it. After > the task has finished executing, it updates the generated_content member on > the model, either to contain any pertinent error messages if there was a > failure, or to store rendered HTML if the task was successful. > > The problem occurs when the generated HTML contains particular unicode > characters (in this case, right single quotation mark, \u2019), which for > some reason prompts django or MySQLdb to decide to convert it to unicode. > The unicode HTML comes from rendering a django template; here's the snippet > that generates the HTML: > > cdict = { ... } # left out; template renders correctly, so not > important.. > ctxt = Context(cdict) > from django.template import loader > content = loader.render_to_string('the_template.html', > context_instance=ctxt) > self.task.generated_content = content > > This code is called from MigrationTask::execute() - this is in the (working) > PerformMigration class - and is the last thing that happens before we call > save() on the modified instance. Apart from the generated_content, the only > other thing that changes on this model as a result of this code is the > status attribute. > > When we do call save(), the following traceback is produced: > > Traceback (most recent call last): > File > "/usr/local/www/django/ssosp/externals/identity_provider/tasks/management/commands/taskrunner.py", > line 44, in handle > task.execute() > File > "/usr/local/www/django/ssosp/externals/identity_provider/tasks/models.py", > line 39, in execute > self.save() > File > "/usr/local/www/django/ssosp/root/lib/python2.5/site-packages/django/db/models/base.py", > line 307, in save > self.save_base(force_insert=force_insert, force_update=force_update) > File > "/usr/local/www/django/ssosp/root/lib/python2.5/site-packages/django/db/models/base.py", > line 358, in save_base > rows = manager.filter(pk=pk_val)._update(values) > File > "/usr/local/www/django/ssosp/root/lib/python2.5/site-packages/django/db/models/query.py", > line 429, in _update > return query.execute_sql(None) > File > "/usr/local/www/django/ssosp/root/lib/python2.5/site-packages/django/db/models/sql/subqueries.py", > line 117, in execute_sql > cursor = super(UpdateQuery, self).execute_sql(result_type) > File > "/usr/local/www/django/ssosp/root/lib/python2.5/site-packages/django/db/models/sql/query.py", > line 1700, in execute_sql > cursor.execute(sql, params) > File > "/usr/local/www/django/ssosp/root/lib/python2.5/site-packages/django/db/backends/mysql/base.py", > line 83, in execute > return self.cursor.execute(query, args) > File "/usr/local/lib/python2.5/site-packages/MySQLdb/cursors.py", line > 151, in execute > query = query % db.literal(args) > File "/usr/local/lib/python2.5/site-packages/MySQLdb/connections.py", line > 247, in literal > return self.escape(o, self.encoders) > File "/usr/local/lib/python2.5/site-packages/MySQLdb/connections.py", line > 180, in string_literal > return db.string_literal(obj) > UnicodeEncodeError: 'ascii' codec can't encode character u'\u2019' in > position 1182: ordinal not in range(128) > > If I set a break point where we generate the content, print out > repr(content), copy paste that into a django python shell and assign it to a > task's generated_content property, it saves correctly. > > If I manually change content to u'\u2019' inside the debugger, it also saves > correctly. It also works correctly for u'\u2019'*2048, just in case size of > string matters. > > The database and all tables are set to UTF-8 in mysql. My locale is > correctly set up in both cases (en_GB.UTF-8). I'm very confused as to why it > is attempting to convert it to ascii :/ > > Any hints/tips greatly appreciated. > > Cheers > > Tom > > [1]http://pastebin.com/m9e23563 > [2]http://pastebin.com/m564e1cd7 -- You received this message because you are subscribed to the Google Groups "Django users" group. To post to this group, send email to django-us...@googlegroups.com. To unsubscribe from this group, send email to django-users+unsubscr...@googlegroups.com. For more options, visit this group at http://groups.google.com/group/django-users?hl=.