On Sat, 7 May 2022 at 19:02, MRAB <pyt...@mrabarnett.plus.com> wrote: > > On 2022-05-07 17:28, Marco Sulla wrote: > > On Sat, 7 May 2022 at 16:08, Barry <ba...@barrys-emacs.org> wrote: > >> You need to handle the file in bin mode and do the handling of line > >> endings and encodings yourself. It’s not that hard for the cases you > >> wanted. > > > >>>> "\n".encode("utf-16") > > b'\xff\xfe\n\x00' > >>>> "".encode("utf-16") > > b'\xff\xfe' > >>>> "a\nb".encode("utf-16") > > b'\xff\xfea\x00\n\x00b\x00' > >>>> "\n".encode("utf-16").lstrip("".encode("utf-16")) > > b'\n\x00' > > > > Can I use the last trick to get the encoding of a LF or a CR in any > > encoding? > > In the case of UTF-16, it's 2 bytes per code unit, but those 2 bytes > could be little-endian or big-endian. > > As you didn't specify which you wanted, it defaulted to little-endian > and added a BOM (U+FEFF). > > If you specify which endianness you want with "utf-16le" or "utf-16be", > it won't add the BOM: > > >>> # Little-endian. > >>> "\n".encode("utf-16le") > b'\n\x00' > >>> # Big-endian. > >>> "\n".encode("utf-16be") > b'\x00\n'
Well, ok, but I need a generic method to get LF and CR for any encoding an user can input. Do you think that "\n".encode(encoding).lstrip("".encode(encoding)) is good for any encoding? Furthermore, is there a way to get the encoding of an opened file object? -- https://mail.python.org/mailman/listinfo/python-list