On Mon, Jan 15, 2018, at 09:35, Peter Otten wrote: > Peng Yu wrote: > > > Can utf-8 encoded character contain a byte of TAB? > > Yes; ascii is a subset of utf8. > > If you want to allow fields containing TABs in a file where TAB is also the > field separator you need a convention to escape the TABs occuring in the > values. Nothing I see in your post can cope with that, but the csv module > can, by quoting field containing the delimiter:
Just to be clear, TAB *only* appears in utf-8 as the encoding for the actual TAB character, not as a part of any other character's encoding. The only bytes that can appear in the utf-8 encoding of non-ascii characters are starting with 0xC2 through 0xF4, followed by one or more of 0x80 through 0xBF. -- https://mail.python.org/mailman/listinfo/python-list