Control: tags 917314 patch
Control: tags 917398 patch
I'm using the attached patch formed by various patches from
the emacs-26 branch and others posted upstream[*]. It fixes
both bugs 917314 and 917398 and some other issues.
[*] https://debbugs.gnu.org/cgi/bugreport.cgi?bug=18871
https://debbugs.gnu.org/cgi/bugreport.cgi?bug=32823
https://debbugs.gnu.org/cgi/bugreport.cgi?bug=33887
It includes:
ca14dd1d4628094dd33d5d94694dcf5f29e843b8 (emacs-26 branch)
7dab3ee7ab54b3c2e7bc24170376054786c01d6f (emacs-26 branch)
3a1a36b0b42772f35c70fb7e996ba8fed787e1c2 [1]
a866e4f4b556fb4a346fa68c62296f10966690a1 [2]
e4cde42657f8f91f795e6b7041dc50b896dc468d (emacs-26 branch)
b8c659dc0c6ef41adc19db3c1884f73c1a10c8d8 [3]
2025fa25f76fd8a2df46fca8807ca386372757d5 [4]
d1520ab5b94d0f130955800ea11222a3702a5519 [5]
0c3e6a97f92dec31e7e186dae933c86700034089 [6]
290bbf4341c58339dafe46ed87b1979115306556 [7]
[1] 0001-Backport-sgml-syntax-propertize-rules-speedup-Bug-33.patch
at https://debbugs.gnu.org/cgi/bugreport.cgi?bug=33887#58
[2] 0001-Fix-sgml-syntax-handling-of-quotes-in-comments.patch
at https://debbugs.gnu.org/cgi/bugreport.cgi?bug=33887#73
[3] 0001-Keep-nxml-prolog-end-up-to-date-Bug-18871.patch
at https://debbugs.gnu.org/cgi/bugreport.cgi?bug=18871#30
[4] 0001-Handle-lone-quote-500-characters-away-from-XML-tag-B.patch
at https://debbugs.gnu.org/cgi/bugreport.cgi?bug=33887#94
[5] 0002-Handle-outside-SGML-XML-tags-Bug-33887.patch
at https://debbugs.gnu.org/cgi/bugreport.cgi?bug=33887#94
[6] 0001-Don-t-fontify-text-outside-of-SGML-XML-tags-Bug-3388.patch
at https://debbugs.gnu.org/cgi/bugreport.cgi?bug=33887#124
[7] 0001-Don-t-sgml-syntax-propertize-inside-XML-prolog-Bug-3.patch
at https://debbugs.gnu.org/cgi/bugreport.cgi?bug=32823#39
--
Vincent Lefèvre <[email protected]> - Web: <https://www.vinc17.net/>
100% accessible validated (X)HTML - Blog: <https://www.vinc17.net/blog/>
Work: CR INRIA - computer arithmetic / AriC project (LIP, ENS-Lyon)
Index: emacs-26.1+1/lisp/nxml/nxml-rap.el
===================================================================
--- emacs-26.1+1.orig/lisp/nxml/nxml-rap.el
+++ emacs-26.1+1/lisp/nxml/nxml-rap.el
@@ -35,35 +35,25 @@
;;
;; Our strategy is to keep track of just the problematic things.
;; Specifically, we keep track of all comments, CDATA sections and
-;; processing instructions in the instance. We do this by marking all
-;; except the first character of these with a non-nil nxml-inside text
-;; property. The value of the nxml-inside property is comment,
-;; cdata-section or processing-instruction. The first character does
-;; not have the nxml-inside property so we can find the beginning of
-;; the construct by looking for a change in a text property value
-;; (Emacs provides primitives for this). We use text properties
-;; rather than overlays, since the implementation of overlays doesn't
-;; look like it scales to large numbers of overlays in a buffer.
-;;
-;; We don't in fact track all these constructs, but only track them in
-;; some initial part of the instance.
+;; processing instructions in the instance. We do this by marking
+;; the first character of these with the generic string syntax by setting
+;; a 'syntax-table' text property in `sgml-syntax-propertize'.
;;
;; Thus to parse some random point in the file we first ensure that we
-;; have scanned up to that point. Then we search backwards for a
-;; <. Then we check whether the < has an nxml-inside property. If it
-;; does we go backwards to first character that does not have an
-;; nxml-inside property (this character must be a <). Then we start
-;; parsing forward from the < we have found.
+;; have scanned up to that point. Then we search backwards for a <.
+;; Then we check whether the < has the generic string syntax. If it
+;; does we go backwards to first character of the generic string (this
+;; character must be a <). Then we start parsing forward from the <
+;; we have found.
;;
;; The prolog has to be parsed specially, so we also keep track of the
;; end of the prolog in `nxml-prolog-end'. The prolog is reparsed on
;; every change to the prolog. This won't work well if people try to
;; edit huge internal subsets. Hopefully that will be rare.
;;
-;; We keep track of the changes by adding to the buffer's
-;; after-change-functions hook. Scanning is also done as a
-;; prerequisite to fontification by adding to fontification-functions
-;; (in the same way as jit-lock). This means that scanning for these
+;; We rely on the `syntax-propertize-function' machinery to keep track
+;; of the changes in the buffer. Fontification also relies on correct
+;; `syntax-table' properties. This means that scanning for these
;; constructs had better be quick. Fortunately it is. Firstly, the
;; typical proportion of comments, CDATA sections and processing
;; instructions is small relative to other things. Secondly, to scan
@@ -79,7 +69,15 @@
"Integer giving position following end of the prolog.")
(defsubst nxml-get-inside (pos)
- (save-excursion (nth 8 (syntax-ppss pos))))
+ "Return non-nil if inside comment, CDATA, or PI."
+ (let ((ppss (save-excursion (syntax-ppss pos))))
+ (or
+ ;; Inside comment.
+ (nth 4 ppss)
+ ;; Inside "generic" string which is used for CDATA, and PI.
+ ;; "Normal" double and single quoted strings are used for
+ ;; attribute values.
+ (eq t (nth 3 ppss)))))
(defun nxml-inside-end (pos)
"Return the end of the inside region containing POS.
@@ -110,6 +108,12 @@ Return nil if the character at POS is no
(setq nxml-prolog-regions (xmltok-forward-prolog))
(setq nxml-prolog-end (point))))
+(defun nxml-maybe-rescan-prolog (start _end _length)
+ "Reparse the prolog if START lies within it.
+`nxml-mode' adds this function on `after-change-functions'."
+ (when (<= start nxml-prolog-end)
+ (save-excursion
+ (nxml-scan-prolog))))
;;; Random access parsing
Index: emacs-26.1+1/test/lisp/nxml/nxml-mode-tests.el
===================================================================
--- /dev/null
+++ emacs-26.1+1/test/lisp/nxml/nxml-mode-tests.el
@@ -0,0 +1,125 @@
+;;; nxml-mode-tests.el --- Test NXML Mode -*- lexical-binding: t; -*-
+
+;; Copyright (C) 2019 Free Software Foundation, Inc.
+
+;; GNU Emacs is free software: you can redistribute it and/or modify
+;; it under the terms of the GNU General Public License as published by
+;; the Free Software Foundation, either version 3 of the License, or
+;; (at your option) any later version.
+
+;; GNU Emacs is distributed in the hope that it will be useful,
+;; but WITHOUT ANY WARRANTY; without even the implied warranty of
+;; MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
+;; GNU General Public License for more details.
+
+;; You should have received a copy of the GNU General Public License
+;; along with GNU Emacs. If not, see <https://www.gnu.org/licenses/>.
+
+;;; Code:
+
+(require 'ert)
+(require 'nxml-mode)
+
+(defun nxml-mode-tests-correctly-indented-string (str)
+ (with-temp-buffer
+ (nxml-mode)
+ (insert str)
+ (indent-region (point-min) (point-max))
+ (equal (buffer-string) str)))
+
+(ert-deftest nxml-indent-line-after-attribute ()
+ (should (nxml-mode-tests-correctly-indented-string "
+<settings
+ xmlns=\"http://maven.apache.org/SETTINGS/1.0.0\"
+ xmlns:xsi=\"http://www.w3.org/2001/XMLSchema-instance\"
+ xsi:schemaLocation=\"http://maven.apache.org/SETTINGS/1.0.0
+ https://maven.apache.org/xsd/settings-1.0.0.xsd\">
+ <mirrors>
+ ...
+ </mirrors>
+</settings>
+"))
+ (should (nxml-mode-tests-correctly-indented-string "\
+<x>
+ <abc xx=\"x/x/x/x/x/x/x/
+ y/y/y/y/y/y/
+ \">
+ <zzz/>
+ </abc>
+ <nl> </nl>
+</x>
+")))
+
+(ert-deftest nxml-balanced-close-start-tag-inline ()
+ (with-temp-buffer
+ (nxml-mode)
+ (insert "<a><b c=\"\"</a>")
+ (search-backward "</a>")
+ (nxml-balanced-close-start-tag-inline)
+ (should (equal (buffer-string) "<a><b c=\"\"></b></a>"))))
+
+(ert-deftest nxml-mode-font-lock-quotes ()
+ (with-temp-buffer
+ (nxml-mode)
+ (insert "<x a=\"dquote attr\" b='squote attr'>\"dquote text\"'squote
text'</x>")
+ (font-lock-ensure)
+ (let ((squote-txt-pos (search-backward "squote text"))
+ (dquote-txt-pos (search-backward "dquote text"))
+ (squote-att-pos (search-backward "squote attr"))
+ (dquote-att-pos (search-backward "dquote attr")))
+ ;; Just make sure that each quote uses the same face for quoted
+ ;; attribute values, and a different face for quoted text
+ ;; outside tags. Don't test `font-lock-string-face' vs
+ ;; `nxml-attribute-value' here.
+ (should (equal (get-text-property squote-att-pos 'face)
+ (get-text-property dquote-att-pos 'face)))
+ (should (equal (get-text-property squote-txt-pos 'face)
+ (get-text-property dquote-txt-pos 'face)))
+ (should-not (equal (get-text-property squote-txt-pos 'face)
+ (get-text-property dquote-att-pos 'face))))))
+
+(ert-deftest nxml-mode-doctype-and-quote-syntax ()
+ (with-temp-buffer
+ (insert "<!DOCTYPE t [\n<!ENTITY f SYSTEM \"f.xml\">\n]>\n<t>'</t>")
+ (nxml-mode)
+ ;; Check that last tag is parsed as a tag.
+ (should (= 1 (car (syntax-ppss (1- (point-max))))))
+ (should (= 0 (car (syntax-ppss (point-max)))))))
+
+(ert-deftest nxml-mode-prolog-comment ()
+ (with-temp-buffer
+ (insert "<?xml version=\"1.0\" encoding=\"utf-8\"?><!-- comment1 -->
+<t><!-- comment2 --></t><!-- comment3 -->")
+ (nxml-mode)
+ ;; Check that all comments are parsed as comments
+ (goto-char (point-min))
+ (search-forward "comment1")
+ (should (nth 4 (syntax-ppss)))
+ (search-forward "comment2")
+ (should (nth 4 (syntax-ppss)))
+ (search-forward "comment3")))
+
+(ert-deftest nxml-mode-quote-in-long-text ()
+ (with-temp-buffer
+ (nxml-mode)
+ (insert "<t>"
+ ;; `syntax-propertize-wholelines' extends chunk size based
+ ;; on line length, so newlines are significant!
+ (make-string syntax-propertize-chunk-size ?a) "\n"
+ "'"
+ (make-string syntax-propertize-chunk-size ?a) "\n"
+ "</t>")
+ ;; If we just check (syntax-ppss (point-max)) immediately, then
+ ;; we'll end up propertizing the whole buffer in one chunk (so the
+ ;; test is useless). Simulate something more like what happens
+ ;; when the buffer is viewed normally.
+ (cl-loop for pos from (point-min) to (point-max)
+ by syntax-propertize-chunk-size
+ do (syntax-ppss pos))
+ (syntax-ppss (point-max))
+ ;; Check that last tag is parsed as a tag.
+ (should (= 1 (- (car (syntax-ppss (1- (point-max))))
+ (car (syntax-ppss (point-max))))))))
+
+(provide 'nxml-mode-tests)
+;;; nxml-mode-tests.el ends here
Index: emacs-26.1+1/lisp/textmodes/sgml-mode.el
===================================================================
--- emacs-26.1+1.orig/lisp/textmodes/sgml-mode.el
+++ emacs-26.1+1/lisp/textmodes/sgml-mode.el
@@ -100,24 +100,20 @@ a DOCTYPE or an XML declaration."
:group 'sgml
:type 'hook)
-;; As long as Emacs's syntax can't be complemented with predicates to context
-;; sensitively confirm the syntax of characters, we have to live with this
-;; kludgy kind of tradeoff.
-(defvar sgml-specials '(?\")
+;; The official handling of "--" is complicated in SGML, and
+;; historically not well supported by browser HTML parsers.
+;; Recommendations for writing HTML comments is to use <!--...-->
+;; (where ... doesn't contain "--") to avoid the complications
+;; altogether (XML goes even further by requiring this in the spec).
+;; So there is probably no need to handle it "correctly".
+(defvar sgml-specials '(?\" ?\')
"List of characters that have a special meaning for SGML mode.
This list is used when first loading the `sgml-mode' library.
-The supported characters and potential disadvantages are:
+The supported characters are ?\\\", ?\\=', and ?-.
- ?\\\" Makes \" in text start a string.
- ?\\=' Makes \\=' in text start a string.
- ?- Makes -- in text start a comment.
-
-When only one of ?\\\" or ?\\=' are included, \"\\='\" or \\='\"\\=', as can
be found in
-DTDs, start a string. To partially avoid this problem this also makes these
-self insert as named entities depending on `sgml-quick-keys'.
-
-Including ?- has the problem of affecting dashes that have nothing to do
-with comments, so we normally turn it off.")
+Including ?- makes double dashes into comment delimiters, but
+they are really only supposed to delimit comments within DTD
+definitions. So we normally turn it off.")
(defvar sgml-quick-keys nil
"Use <, >, &, /, SPC and `sgml-specials' keys \"electrically\" when non-nil.
@@ -341,6 +337,12 @@ Any terminating `>' or `/' is not matche
(defvar sgml-font-lock-keywords sgml-font-lock-keywords-1
"Rules for highlighting SGML code. See also `sgml-tag-face-alist'.")
+(defun sgml-font-lock-syntactic-face (state)
+ "`font-lock-syntactic-face-function' for `sgml-mode'."
+ ;; Don't use string face outside of tags.
+ (cond ((and (nth 9 state) (nth 3 state)) font-lock-string-face)
+ ((nth 4 state) font-lock-comment-face)))
+
(eval-and-compile
(defconst sgml-syntax-propertize-rules
(syntax-propertize-precompile-rules
@@ -351,12 +353,33 @@ Any terminating `>' or `/' is not matche
("--[ \t\n]*\\(>\\)" (1 "> b"))
("\\(<\\)[?!]" (1 (prog1 "|>"
(sgml-syntax-propertize-inside end))))
- ;; Double quotes outside of tags should not introduce strings.
- ;; Be careful to call `syntax-ppss' on a position before the one we're
- ;; going to change, so as not to need to flush the data we just computed.
- ("\"" (0 (if (prog1 (zerop (car (syntax-ppss (match-beginning 0))))
- (goto-char (match-end 0)))
- (string-to-syntax ".")))))))
+ ;; Quotes outside of tags should not introduce strings which end up
+ ;; hiding tags. We used to test every quote and mark it as "."
+ ;; if it's outside of tags, but there are too many quotes and
+ ;; the resulting number of calls to syntax-ppss made it too slow
+ ;; (bug#33887), so we're now careful to leave alone any pair
+ ;; of quotes that doesn't hold a < or > char, which is the vast majority.
+ ;; We also check quotes which are unpaired to end of line,
+ ;; otherwise we miss the case where the quote might "contain" an
+ ;; angle bracket outside of the current syntax-propertize chunk
+ ;; (this relies on `syntax-propertize-wholelines' being enabled).
+ ("\\([\"']\\)[^<>\"']*\\([<>\"']\\|$\\)"
+ (1 (unless (eq (char-after (match-beginning 1))
+ (char-after (match-beginning 2)))
+ ;; Be careful to call `syntax-ppss' on a position before the one
+ ;; we're going to change, so as not to need to flush the data we
+ ;; just computed.
+ (let ((ppss (syntax-ppss (match-beginning 0))))
+ ;; Can't rely on depth (nth 0 ppss), because we don't
+ ;; mark ">" outside of tags.
+ (if (prog1 (null (nth 9 ppss)) ; Outside tag.
+ (goto-char (1- (match-end 0)))
+ ;; If we're in a comment, don't skip over comment
+ ;; ender.
+ (when (nth 4 ppss)
+ (skip-chars-backward "- \t\n")))
+ (string-to-syntax "."))))))
+ )))
(defun sgml-syntax-propertize (start end)
"Syntactic keywords for `sgml-mode'."
@@ -552,7 +575,7 @@ Do \\[describe-key] on the following bin
;; This is desirable because SGML discards a newline that appears
;; immediately after a start tag or immediately before an end tag.
(setq-local paragraph-start (concat "[ \t]*$\\|\
-[ \t]*</?\\(" sgml-name-re sgml-attrs-re "\\)?>"))
+\[ \t]*</?\\(" sgml-name-re sgml-attrs-re "\\)?>"))
(setq-local paragraph-separate (concat paragraph-start "$"))
(setq-local adaptive-fill-regexp "[ \t]*")
(add-hook 'fill-nobreak-predicate 'sgml-fill-nobreak nil t)
@@ -570,7 +593,9 @@ Do \\[describe-key] on the following bin
(setq font-lock-defaults '((sgml-font-lock-keywords
sgml-font-lock-keywords-1
sgml-font-lock-keywords-2)
- nil t))
+ nil t nil
+ (font-lock-syntactic-face-function
+ . sgml-font-lock-syntactic-face)))
(setq-local syntax-propertize-function #'sgml-syntax-propertize)
(setq-local facemenu-add-face-function 'sgml-mode-facemenu-add-face-function)
(setq-local sgml-xml-mode (sgml-xml-guess))
Index: emacs-26.1+1/test/lisp/textmodes/sgml-mode-tests.el
===================================================================
--- emacs-26.1+1.orig/test/lisp/textmodes/sgml-mode-tests.el
+++ emacs-26.1+1/test/lisp/textmodes/sgml-mode-tests.el
@@ -125,11 +125,29 @@ The point is set to the beginning of the
(should (string= content (buffer-string))))))
(ert-deftest sgml-delete-tag-bug-8203-should-not-delete-apostrophe ()
- :expected-result :failed
(sgml-with-content
"<title>Winter is comin'</title>"
(sgml-delete-tag 1)
(should (string= "Winter is comin'" (buffer-string)))))
+(ert-deftest sgml-tests--quotes-syntax ()
+ (dolist (str '("a\"b <t>c'd</t>"
+ "a'b <t>c\"d</t>"
+ "<t>\"a'</t>"
+ "<t>'a\"</t>"
+ "<t>\"a'\"</t>"
+ "<t>'a\"'</t>"
+ "<t><!-- ' --></t>"
+ "<t><!-- \" --></t>"
+ ;; Yes, ">" is technically valid outside tags!
+ "<t>>'</t>"
+ ))
+ (ert-info (str :prefix "Test string: ")
+ (sgml-with-content
+ str
+ ;; Check that last tag is parsed as a tag.
+ (should (= 1 (- (car (syntax-ppss (1- (point-max))))
+ (car (syntax-ppss (point-max))))))))))
+
(provide 'sgml-mode-tests)
;;; sgml-mode-tests.el ends here
Index: emacs-26.1+1/lisp/nxml/nxml-mode.el
===================================================================
--- emacs-26.1+1.orig/lisp/nxml/nxml-mode.el
+++ emacs-26.1+1/lisp/nxml/nxml-mode.el
@@ -423,6 +423,30 @@ reference.")
(when rng-validate-mode
(rng-validate-while-idle (current-buffer)))))
+(defvar nxml-prolog-end) ;; nxml-rap.el
+
+(defun nxml-syntax-propertize (start end)
+ "Syntactic keywords for `nxml-mode'."
+ ;; Like `sgml-syntax-propertize', but handle `nxml-prolog-regions'.
+ (when (< start nxml-prolog-end)
+ (catch 'done-prolog
+ (dolist (prolog-elem nxml-prolog-regions)
+ (let ((type (aref prolog-elem 0))
+ (pbeg (aref prolog-elem 1))
+ (pend (aref prolog-elem 2)))
+ (when (eq type 'comment)
+ (put-text-property pbeg (1+ pbeg)
+ 'syntax-table (string-to-syntax "< b"))
+ (put-text-property (1- pend) pend
+ 'syntax-table (string-to-syntax "> b")))
+ (when (> pend end)
+ (throw 'done-prolog t)))))
+ (setq start nxml-prolog-end))
+ (if (>= start end)
+ (goto-char end)
+ (goto-char start)
+ (sgml-syntax-propertize start end)))
+
(defvar tildify-space-string)
(defvar tildify-foreach-region-function)
@@ -518,8 +542,9 @@ Many aspects this mode can be customized
(nxml-with-invisible-motion
(nxml-scan-prolog)))))
(setq-local syntax-ppss-table sgml-tag-syntax-table)
- (setq-local syntax-propertize-function #'sgml-syntax-propertize)
+ (setq-local syntax-propertize-function #'nxml-syntax-propertize)
(add-hook 'change-major-mode-hook #'nxml-cleanup nil t)
+ (add-hook 'after-change-functions #'nxml-maybe-rescan-prolog nil t)
;; Emacs 23 handles the encoding attribute on the xml declaration
;; transparently to nxml-mode, so there is no longer a need for the below
@@ -540,7 +565,9 @@ Many aspects this mode can be customized
nil ; no special syntax table
(font-lock-extend-region-functions . (nxml-extend-region))
(jit-lock-contextually . t)
- (font-lock-unfontify-region-function . nxml-unfontify-region)))
+ (font-lock-unfontify-region-function . nxml-unfontify-region)
+ (font-lock-syntactic-face-function
+ . sgml-font-lock-syntactic-face)))
(with-demoted-errors (rng-nxml-mode-init)))
@@ -2378,7 +2405,9 @@ With a prefix argument, inserts the char
(put 'nxml-char-ref 'evaporate t)
(defun nxml-char-ref-display-extra (start end n)
- (when nxml-char-ref-extra-display
+ (when (and ;; Displaying literal newline is unhelpful.
+ (not (eql n ?\n))
+ nxml-char-ref-extra-display)
(let ((name (or (get-char-code-property n 'name)
(get-char-code-property n 'old-name)))
(glyph-string (and nxml-char-ref-display-glyph-flag
Index: emacs-26.1+1/lisp/font-lock.el
===================================================================
--- emacs-26.1+1.orig/lisp/font-lock.el
+++ emacs-26.1+1/lisp/font-lock.el
@@ -527,9 +527,12 @@ If nil, this is ignored, in which case t
sometimes be slightly incorrect.")
(make-variable-buffer-local 'font-lock-syntactically-fontified)
+(defun font-lock-syntactic-face-function-default (state)
+ "Default value for `font-lock-syntactic-face-function'."
+ (if (nth 3 state) font-lock-string-face font-lock-comment-face))
+
(defvar font-lock-syntactic-face-function
- (lambda (state)
- (if (nth 3 state) font-lock-string-face font-lock-comment-face))
+ #'font-lock-syntactic-face-function-default
"Function to determine which face to use when fontifying syntactically.
The function is called with a single parameter (the state as returned by
`parse-partial-sexp' at the beginning of the region to highlight) and