Hi,
I use lynx to convert HTML to plain-text, but noticed an issue where part of
the output is missing with UTF-8 in CDATA sections.
Below is a small test-case to reproduce it:
Works correctly:
a’b
Doesn't work correctly:
This byte sequence for the UTF-8 codepoint is: printf '\342\200\231'
On Thu, Jul 27, 2023 at 10:25:13PM +0200, Hiltjo Posthuma wrote:
> Hi,
>
> I use lynx to convert HTML to plain-text, but noticed an issue where part of
> the output is missing with UTF-8 in CDATA sections.
>
> Below is a small test-case to reproduce it:
>
> Works corre