On 12/30/2018 03:49 PM, Leif Halvard Silli wrote:
On 26 Dec 2018, at 11:34, Hussein Shafie wrote:
Kazuko O. wrote:
And when I edit the UTF-8 MDITA file with BOM and save it by
using xxe,
the file is changed to UTF-8 file without BOM.
How do you think about the above case?
This is clearly an oversight. In the next version of XXE, a text
file originally starting an UTF-8 or UTF-16 BOM will be saved back
to disk with its BOM.
1) Great.
2) While it can be considered an oversight, it sounds very typical: I
have been told - and it makes sense - that in many toolchains, the BOM
is an irritating feature. For example, if two files (each with their own
BOM) are glued together (as in 'file1.txt' + 'file2.txt' =
'file1file2.txt'), the gluing process must to make sure that the new
file contains only one - and not two - BOMs.
3) Can we expect the BOM to be retained for 'application' files as well?
I'm sorry but the answer is no. Retaining the BOM will be limited to
plain text files.
Such as for 'application/xhtml+xml' files? Currently it is not saved. It
would be great if we would see this for such files as well!
Currently, for HTML files, when there is no <meta charset="UTF-8"/>
element (but there is - or could be - a BOM), the XML encoding
declaration is added:
A file that begins
* <BOM><!DOCTYPE html>
… is saved by XXE back to the computer without the BOM but with the XML
encoding declaration …
* <?xml version="1.0" encoding="UTF-8"?><!DOCTYPE html>
Had the XML encoding declaration NOT been added, the user would end up
in the same situation as for Kazuko’s .md files. Namely: The file would
default to whatever the default is considered to be (for HTML files, the
default is typically Windows-1252).
However, I would argue that
1) If anything should be added at all, then it should be the HTML
encoding declaration:
* <!DOCTYPE html>[ ... snip ... ]<meta charset="UTF-8"/>
Why? Because the addition of the XML encoding declaration is not valid
in (text/html) HTML files. Hence it is the wrong strategy to add it.
2) Even if anything (the encoding declaration(s) of XML and/or HTML) is
added, the BOM should - by default - be retained (even if could be an
user option to not retain it).
3) However, I would argue that at least there should be an option to not
add an encoding declaration when there is a BOM.
We don't think there is a problem with the way 'application/xhtml+xml'
files are currently saved by XXE. Therefore we'll not implement what you
suggest.
--
XMLmind XML Editor Support List
xmleditor-support@xmlmind.com
http://www.xmlmind.com/mailman/listinfo/xmleditor-support