On 26 Dec 2018, at 11:34, Hussein Shafie wrote:

Kazuko O. wrote:

And when I edit the UTF-8 MDITA file with BOM and save it by using xxe,
the file is changed to UTF-8 file without BOM.
How do you think about the above case?

This is clearly an oversight. In the next version of XXE, a text file originally starting an UTF-8 or UTF-16 BOM will be saved back to disk with its BOM.

1) Great.

2) While it can be considered an oversight, it sounds very typical: I have been told - and it makes sense - that in many toolchains, the BOM is an irritating feature. For example, if two files (each with their own BOM) are glued together (as in 'file1.txt' + 'file2.txt' = 'file1file2.txt'), the gluing process must to make sure that the new file contains only one - and not two - BOMs.

3) Can we expect the BOM to be retained for 'application' files as well? Such as for 'application/xhtml+xml' files? Currently it is not saved. It would be great if we would see this for such files as well!

Currently, for HTML files, when there is no <meta charset="UTF-8"/> element (but there is - or could be - a BOM), the XML encoding declaration is added:

A file that begins

* <BOM><!DOCTYPE html>

… is saved by XXE back to the computer without the BOM but with the XML encoding declaration …

* <?xml version="1.0" encoding="UTF-8"?><!DOCTYPE html>

Had the XML encoding declaration NOT been added, the user would end up in the same situation as for Kazuko’s .md files. Namely: The file would default to whatever the default is considered to be (for HTML files, the default is typically Windows-1252).

However, I would argue that

1) If anything should be added at all, then it should be the HTML encoding declaration:

* <!DOCTYPE html>[ ... snip ... ]<meta charset="UTF-8"/>

Why? Because the addition of the XML encoding declaration is not valid in (text/html) HTML files. Hence it is the wrong strategy to add it.

2) Even if anything (the encoding declaration(s) of XML and/or HTML) is added, the BOM should - by default - be retained (even if could be an user option to not retain it).

3) However, I would argue that at least there should be an option to not add an encoding declaration when there is a BOM.
--
Leif Halvard Silli
--
XMLmind XML Editor Support List
xmleditor-support@xmlmind.com
http://www.xmlmind.com/mailman/listinfo/xmleditor-support

Reply via email to