Ethan Furman wrote: > So I'm working with postgres, and I get a datadump which I try to restore > to my test system, and I get this: > > ERROR: value too long for type character varying(4) > CONTEXT: COPY res_currency, line 32, column symbol: "руб" > > "py6" sure looks like it should fit, but it don't. Further investigation > revealed that "py6" is made up of the bytes d1 80 d1 83 d0 b1. > > Any ideas on what that means, exactly?
It may look like the ascii "py6", but you have three cyrillic letters: >>> import unicodedata as ud >>> [ud.name(c) for c in u"руб"] ['CYRILLIC SMALL LETTER ER', 'CYRILLIC SMALL LETTER U', 'CYRILLIC SMALL LETTER BE'] The dump you are seeing are the corresponding bytes in UTF-8: >>> u"руб".encode("utf-8") '\xd1\x80\xd1\x83\xd0\xb1' So postgres may be storing the string as utf-8. -- https://mail.python.org/mailman/listinfo/python-list