On Sun, Mar 8, 2015 at 2:48 AM, Marko Rauhamaa <ma...@pacujo.net> wrote: > Steven D'Aprano <steve+comp.lang.pyt...@pearwood.info>: > >> Marko Rauhamaa wrote: >> >>> That said, UTF-8 does suffer badly from its not being >>> a bijective mapping. >> >> Can you explain? > > In Python terms, there are bytes objects b that don't satisfy: > > b.decode('utf-8').encode('utf-8') == b
Please provide an example; that sounds like a bug. If there is any invalid UTF-8 stream which decodes without an error, it is actually a security bug, and should be fixed pronto in all affected and supported versions. ChrisA -- https://mail.python.org/mailman/listinfo/python-list