On 2022-05-07 19:47, Stefan Ram wrote:
Marco Sulla <marco.sulla.pyt...@gmail.com> writes:
Well, ok, but I need a generic method to get LF and CR for any
encoding an user can input.
"LF" and "CR" come from US-ASCII. It is theoretically
possible that there might be some encodings out there
(not for Unicode) that are not based on US-ASCII and
have no LF or no CR.
is good for any encoding? Furthermore, is there a way to get the
encoding of an opened file object?
I have written a function that might be able to detect one
of few encodings based on a heuristic algorithm.
def encoding( name ):
path = pathlib.Path( name )
for encoding in( "utf_8", "latin_1", "cp1252" ):
try:
with path.open( encoding=encoding, errors="strict" )as file:
text = file.read()
return encoding
except UnicodeDecodeError:
pass
return "ascii"
Yes, it's potentially slow and might be wrong.
The result "ascii" might mean it's a binary file.
"latin-1" will decode any sequence of bytes, so it'll never try
"cp1252", nor fall back to "ascii", and falling back to "ascii" is wrong
anyway because the file could contain 0x80..0xFF, which aren't supported
by that encoding.
--
https://mail.python.org/mailman/listinfo/python-list