On Thu, 7 Jan 2021 at 14:11, Claude Pache <claude.pa...@gmail.com> wrote:

> Hi,
>
> > Le 26 déc. 2020 à 12:02, Craig Francis <cr...@craigfrancis.co.uk> a
> écrit :
> >
> > (...)
> > PHP uses the numeric version &#039; with ENT_QUOTES, and it should
> continue
> > to do so - because the named version, &apos; was added in HTML5, but can
> > still cause problems with legacy parsers; for example Android 4, and the
> > one still in use by Microsoft Outlook (&amp;/&gt;/&lt; was in the
> > original HTML spec, and &quot; was added in HTML2).
> >
> > (...)
>
> I agree that — in addition to ENT_QUOTES — ENT_HTML401 (which encodes
> quotes as &#039;) is a better default when encoding, but I also think that
> ENT_HTML5 (which recognises both &#039; and &apos;) is a better default
> when decoding.
>



That's a good point for decoding, if I saw:

    echo html_entity_decode(' &#039; &apos; ')

I would expect it to decode both.

I'm tempted to update my PR to use ENT_HTML5 on `html_entity_decode` and
`htmlspecialchars_decode`, to avoid that issue.

But the tests show that it affects a few others, which I think are fine,
but should check what others think:

  &#x0C;    > DECODED, Form Feed
  &#x0D;    > NOT DECODED, Carriage Return
  &#xFDD0;  > NOT DECODED, Invalid character
  &#xFDEF;  > NOT DECODED, Invalid character
  &#xFFFE;  > NOT DECODED, Invalid character
  &#xFFFF;  > NOT DECODED, Invalid character
  &#x2FFFE; > NOT DECODED, Not a character
  &#x2FFFF; > NOT DECODED, Not a character

Craig

Reply via email to