Hello, after two days of failed efforts and googling, I thought I had better seek advice or observations from the experts. I would be grateful for any input.
We have various small internal web applications that use utf-8 pages for storing, searching and retrieving user input. They have worked fine for years with non ASCII values, including Russian, Greek and lots of accented characters. They still do on an old version of python (2.2.1), and there's nothing in the code to decode/encode the input, it's *just worked*. Recently however, while testing on a dev machine, I notice that any characters outside ASCII are causing SQL statement usage to break with UnicodeDecodeError exceptions with newer versions of python (2.3 and 2.4). There are a number of threads online, suggesting converting to unicode types, and similar themes, but I'm having no success. I am probably completely misunderstaning something fundamental. :-( My first question is did something change for normal byte stream usage making it more strict? I'm surprised there aren't more problems like this online. Is there a correct way to handle text input from a <FORM> when the page is utf-8 and that input is going to be used in SQL statements? I've tried things like (with no success): sql = u"select * from blah where col='%s'" % input Doing sql = sql.decode('latin1') prior to execution prevents the some UnicodeDecodeError exceptions, but the data retrieved from the tables is no longer usable, causing breakage when being used to create the output for the browser. I really am at a loss for what is going wrong, when everything works fine on crusty old 2.2.1. What are others doing for caputre, store, and output for web utf-8? Rgds, Jason -- http://mail.python.org/mailman/listinfo/python-list