On Tue, Jan 7, 2014 at 2:10 AM, Ethan Furman <et...@stoneleaf.us> wrote: > On 01/05/2014 06:55 PM, Chris Angelico wrote: >> >> >> It can't be both things. It's either bytes or it's text. > > > Of course it can be: > > 0000000: 0372 0106 0000 0000 6100 1d00 0000 0000 .r......a....... > 0000010: 0000 0000 0000 0000 0000 0000 0000 0000 ................ > 0000020: 4e41 4d45 0000 0000 0000 0043 0100 0000 NAME.......C.... > 0000030: 1900 0000 0000 0000 0000 0000 0000 0000 ................ > 0000040: 4147 4500 0000 0000 0000 004e 1a00 0000 AGE........N.... > 0000050: 0300 0000 0000 0000 0000 0000 0000 0000 ................ > 0000060: 0d1a 0a ... > > And there we are, mixed bytes and ascii data. As I said earlier, my example > is minimal, but still very frustrating in that normal operations no longer > work. Incidentally, if you were thinking that NAME and AGE were part of the > ascii text, you'd be wrong -- the field names are also encoded, as are the > Character and Memo fields.
That's alternating between encoded text and non-text bytes. Each individual piece is either text or non-text, not both. The ideal way to manipulate it would most likely be a simple decode operation that turns this into (probably) a dictionary, decoding both the structure/layout and UTF-8 in a single operation. But a less ideal (and more convenient) solution might be involving what's currently under discussion elsewhere: a (possibly partial) percent-formatting or .format() method for bytes. None of this changes the fact that there are bytes used to store/transmit stuff, and abstract concepts used to manipulate them. Just like nobody expects to be able to write a dict to a file without some form of encoding (pickle, JSON, whatever), you shouldn't expect to write a character string without first turning it into bytes. ChrisA -- https://mail.python.org/mailman/listinfo/python-list