On Thu, Jul 2, 2020 at 1:18 PM Akash <akashkurde...@gmail.com> wrote:

> html.UnescapeString("Should this word, &currency, be unescaped?")
>
> https://play.golang.org/p/vN5bvfooq8H
>
> Aren't HTML entities supposed to end with a semicolon? See
> https://developer.mozilla.org/en-US/docs/Glossary/Entity
>
I think the answer is "yes"
<https://html.spec.whatwg.org/multipage/syntax.html#syntax-charref>, when
sending, but "don't count on it" when parsing. (or unescaping in this case)

This comes under the heading of "Be conservative in what you do, be liberal
in what you accept from others"
<https://en.wikipedia.org/wiki/Robustness_principle> (also, backwards
compatibility

The html standard's HTML parsing section for HTML entities says
<https://html.spec.whatwg.org/multipage/parsing.html#named-character-reference-state>
:

> If the character reference was consumed as part of an attribute
> <https://html.spec.whatwg.org/multipage/parsing.html#charref-in-attribute>,
> and the last character matched is not a U+003B SEMICOLON character (;), and
> the next input character
> <https://html.spec.whatwg.org/multipage/parsing.html#next-input-character> is
> either a U+003D EQUALS SIGN character (=) or an ASCII alphanumeric
> <https://infra.spec.whatwg.org/#ascii-alphanumeric>, then, for historical
> reasons, flush code points consumed as a character reference
> <https://html.spec.whatwg.org/multipage/parsing.html#flush-code-points-consumed-as-a-character-reference>
>  and
> switch to the return state
> <https://html.spec.whatwg.org/multipage/parsing.html#return-state>.
>

It looks like it wasn't required in HTML 4.01 as the section on entity
references includes <https://www.w3.org/TR/html401/charset.html#entities>
this note:

> *Note. In SGML, it is possible to eliminate the final ";" after a
> character reference in some cases (e.g., at a line break or immediately
> before a tag). In other circumstances it may not be eliminated (e.g., in
> the middle of a word). We strongly suggest using the ";" in all cases to
> avoid problems with user agents that require this character to be present.*
>

(which, fortunately, aligns with my feeling from the late-90s that the
trailing semicolon was suggested but optional in HTML 4)

>
> I couldn't see any edge cases mentioned in the source
> <https://github.com/golang/go/blob/master/src/html/escape.go#L182> .
>
> Thanks.
>
>
> --
> You received this message because you are subscribed to the Google Groups
> "golang-nuts" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to golang-nuts+unsubscr...@googlegroups.com.
> To view this discussion on the web visit
> https://groups.google.com/d/msgid/golang-nuts/baea05b9-6634-495b-a45f-78f02ec7a20bn%40googlegroups.com
> <https://groups.google.com/d/msgid/golang-nuts/baea05b9-6634-495b-a45f-78f02ec7a20bn%40googlegroups.com?utm_medium=email&utm_source=footer>
> .
>

-- 
You received this message because you are subscribed to the Google Groups 
"golang-nuts" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to golang-nuts+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/golang-nuts/CANrC0BjHUf-GNU%3DT5J-Vi%3DqqREAJdjLHHg2nmUJNKwpnJXrcjg%40mail.gmail.com.

Reply via email to