Hi,

> Le 26 déc. 2020 à 12:02, Craig Francis <cr...@craigfrancis.co.uk> a écrit :
> 
> (...)
> PHP uses the numeric version &#039; with ENT_QUOTES, and it should continue
> to do so - because the named version, &apos; was added in HTML5, but can
> still cause problems with legacy parsers; for example Android 4, and the
> one still in use by Microsoft Outlook (&amp;/&gt;/&lt; was in the
> original HTML spec, and &quot; was added in HTML2).
> 
> (...)

I agree that — in addition to ENT_QUOTES — ENT_HTML401 (which encodes quotes as 
&#039;) is a better default when encoding, but I also think that ENT_HTML5 
(which recognises both &#039; and &apos;) is a better default when decoding. 
This is not just when $flags is missing, it is also when neither of 
ENT_HTML401, ENT_HTML5 or ENT_XML1 appears explicitly in the bitmask, i.e.:

htmlspecialchars($x, ENT_QUOTES); // should be equivalent to ENT_QUOTES | 
ENT_HTML401

html_entity_decode($x, ENT_QUOTES); // should be equivalent to ENT_QUOTES | 
ENT_HTML5

The difference between ENT_HTML401 and ENT_HTML5 and their practical effect 
(one of them more compatible when encoding and the other more compatible when 
decoding) is probably too subtle for most people, assuming they even know the 
existence of such flags. (In the codebase I’m taking care of, there is 
somewhere a comment that says “html_entity_decode does not decode &apos;” 
followed by code handling manually that specific entity.)

—Claude

--
PHP Internals - PHP Runtime Development Mailing List
To unsubscribe, visit: https://www.php.net/unsub.php

Reply via email to