In article <roy-a1971d.12193930032...@news.panix.com>, Roy Smith <r...@panix.com> wrote:
> My unicode-fu is a bit weak. Are we looking at a Python problem, a > MySQLdb problem, or a problem with the underlying MySQL server? We've > certainly inserted utf-8 data before without any problems. It's > possible this is the first time we've tried to handle a character > outside the BMP. Sigh. As is so often the case, I found the answer shortly after posting this. http://stackoverflow.com/questions/1890693/ It turns out MySQL (at least the version we're running) can't handle characters outside the BMP! OK, that leads to the next question. Is there anyway I can (in Python 2.7) detect when a string is not entirely in the BMP? If I could find all the non-BMP characters, I could replace them with U+FFFD (REPLACEMENT CHARACTER) and life would be good (enough). Apparently, newer versions of MySQL have utf8mb4 which can handle this. On possibility is upgrading to a new MySQL, but if we could just catch and replace the non-BMP characters during ingestion, that would be a lot simpler. -- http://mail.python.org/mailman/listinfo/python-list