Re: [XeTeX] [texworks] Use of BOM in XeTeX and TeXworks

Reinhard Kotucha Mon, 15 Jul 2019 15:32:36 -0700

On 2019-07-15 at 21:43:50 +0200, Manfred Lotz wrote:

 > Nowadays, usage of BOM for UTF-8 is neither required nor recommended.


Hi Manfred,
UTF-8 BOMs were never required.  UTF-8 is a prefix encoding and was
designed to be able to synchronize with a character stream which has
no concept of "beginning of file".  Suppose that you paste something
from another file.  Neither can one expect that this file has a UTF-8
BOM at all nor can one expect that any program supports it.

 > Does it mean that when you setup texworks encoding to be UTF-8, and
 > subsequently load an ISO-8859-1 file texworks doesn't recognize the
 > proper encoding?
 >
 > I know it doesn't help you but vim, for instance, recognizes the
 > encoding of a loaded file although I have configured the default
 > encoding as UTF-8

This can't work reliably at all.  There is no way to determine which
8-bit encoding is being used.  I suppose that vim is using ISO-8859-1
as a fallback encoding if the file contains invalid UTF-8 characters.

An 8-bit encoding can only be determined by heuristics or a priori
knowledge.  There are dependencies between languages and encodings.
AFAIK Mozilla provided a program which tries to determine an encoding
using such heuristics.  I don't know how reliable it is, especially if
files are small.

Phil, the best solution is to convert all the nasty 8-bit files to
UTF-8 using iconv.

Some years ago I compiled iconv for Windows:

  http://ms25.ddns.net/w32/iconv/iconv.zip

The ZIP file has to be extracted in a directory which is in PATH.

Usage:

  iconv --from-code=ISO-8859-1 --to-code=UTF-8 [--output=<outfile>] <infile>

If you omit --output, the input file will be replaced and creating a
backup before is quite useful.

iconv supports zillions of encodings, try iconv --list.

Regards,
  Reinhard

--
------------------------------------------------------------------
Reinhard Kotucha                            Phone: +49-511-3373112
Marschnerstr. 25
D-30167 Hannover                    mailto:reinhard.kotu...@web.de
------------------------------------------------------------------

Re: [XeTeX] [texworks] Use of BOM in XeTeX and TeXworks

Reply via email to