Diez B. Roggisch wrote: >> So then the easiest thing to do is: take the maximum length of a unicode >> string you could possibly want to store, multiply it by 4 and make that >> the length of the DB field. > >> However, I'm pretty convinced it is a bad idea to store Python unicode >> strings directly in a DB, especially as they are not portable. I assume >> that some DB connectors honour the local platform encoding already, but >> I'd still say that UTF-8 is your best friend here. > > It was your assumption that the OP wanted to store the "real" > unicode-strings. A moot point anyway, at it is afaik not possible to get > their contents in byte form (except from a C-extension).
It is possible: >>> u"a\xff\uffff\U0010ffff".encode("unicode-internal") 'a\x00\xff\x00\xff\xff\xff\xdb\xff\xdf' This encoding is useless though, as you can't use it for reencoding on another platform. (And it's probably not what the OP intended.) > And assuming 4 bytes per character is a bit dissipative I'd say - especially > when you have some > 80% ascii-subset in your text as european and american > languages have. That would require UTF-32 as an encoding, which Python currently doesn't have. > The solution was given before: chose an encoding (utf-8 is certainly the > most favorable one), and compute the byte-string length. Exactly! Servus, Walter -- http://mail.python.org/mailman/listinfo/python-list