On 2019-07-15 at 21:43:50 +0200, Manfred Lotz wrote: > Nowadays, usage of BOM for UTF-8 is neither required nor recommended.
Hi Manfred, UTF-8 BOMs were never required. UTF-8 is a prefix encoding and was designed to be able to synchronize with a character stream which has no concept of "beginning of file". Suppose that you paste something from another file. Neither can one expect that this file has a UTF-8 BOM at all nor can one expect that any program supports it. > Does it mean that when you setup texworks encoding to be UTF-8, and > subsequently load an ISO-8859-1 file texworks doesn't recognize the > proper encoding? > > I know it doesn't help you but vim, for instance, recognizes the > encoding of a loaded file although I have configured the default > encoding as UTF-8 This can't work reliably at all. There is no way to determine which 8-bit encoding is being used. I suppose that vim is using ISO-8859-1 as a fallback encoding if the file contains invalid UTF-8 characters. An 8-bit encoding can only be determined by heuristics or a priori knowledge. There are dependencies between languages and encodings. AFAIK Mozilla provided a program which tries to determine an encoding using such heuristics. I don't know how reliable it is, especially if files are small. Phil, the best solution is to convert all the nasty 8-bit files to UTF-8 using iconv. Some years ago I compiled iconv for Windows: http://ms25.ddns.net/w32/iconv/iconv.zip The ZIP file has to be extracted in a directory which is in PATH. Usage: iconv --from-code=ISO-8859-1 --to-code=UTF-8 [--output=<outfile>] <infile> If you omit --output, the input file will be replaced and creating a backup before is quite useful. iconv supports zillions of encodings, try iconv --list. Regards, Reinhard -- ------------------------------------------------------------------ Reinhard Kotucha Phone: +49-511-3373112 Marschnerstr. 25 D-30167 Hannover mailto:reinhard.kotu...@web.de ------------------------------------------------------------------