On 26 Dec 2018, at 11:34, Hussein Shafie wrote:
Kazuko O. wrote:
And when I edit the UTF-8 MDITA file with BOM and save it by using
xxe,
the file is changed to UTF-8 file without BOM.
How do you think about the above case?
This is clearly an oversight. In the next version of XXE, a text file
originally starting an UTF-8 or UTF-16 BOM will be saved back to disk
with its BOM.
1) Great.
2) While it can be considered an oversight, it sounds very typical: I
have been told - and it makes sense - that in many toolchains, the BOM
is an irritating feature. For example, if two files (each with their own
BOM) are glued together (as in 'file1.txt' + 'file2.txt' =
'file1file2.txt'), the gluing process must to make sure that the new
file contains only one - and not two - BOMs.
3) Can we expect the BOM to be retained for 'application' files as well?
Such as for 'application/xhtml+xml' files? Currently it is not saved. It
would be great if we would see this for such files as well!
Currently, for HTML files, when there is no <meta charset="UTF-8"/>
element (but there is - or could be - a BOM), the XML encoding
declaration is added:
A file that begins
* <BOM><!DOCTYPE html>
… is saved by XXE back to the computer without the BOM but with the
XML encoding declaration …
* <?xml version="1.0" encoding="UTF-8"?><!DOCTYPE html>
Had the XML encoding declaration NOT been added, the user would end up
in the same situation as for Kazuko’s .md files. Namely: The file
would default to whatever the default is considered to be (for HTML
files, the default is typically Windows-1252).
However, I would argue that
1) If anything should be added at all, then it should be the HTML
encoding declaration:
* <!DOCTYPE html>[ ... snip ... ]<meta charset="UTF-8"/>
Why? Because the addition of the XML encoding declaration is not valid
in (text/html) HTML files. Hence it is the wrong strategy to add it.
2) Even if anything (the encoding declaration(s) of XML and/or HTML) is
added, the BOM should - by default - be retained (even if could be an
user option to not retain it).
3) However, I would argue that at least there should be an option to not
add an encoding declaration when there is a BOM.
--
Leif Halvard Silli
--
XMLmind XML Editor Support List
xmleditor-support@xmlmind.com
http://www.xmlmind.com/mailman/listinfo/xmleditor-support