Tomas Volf <~@wolfsden.cz> writes:
> I think I found a bug in the htmlprag module in guile-lib.  When parsing
> attributes, the values are not properly decoded:

Thank you for the report!

> --8<---------------cut here---------------start------------->8---
> scheme@(guile-user)> ,use (htmlprag)
> scheme@(guile-user)> (html->sxml "<hr aaa=\"bbb&quot;ccc'ddd\" />")
> $1 = (*TOP* (hr (@ (aaa "bbb&quot;ccc'ddd"))))
> scheme@(guile-user)> (html->sxml "<a href=\"a&amp;b\" />")
> $2 = (*TOP* (a (@ (href "a&amp;b"))))
> --8<---------------cut here---------------end--------------->8---
>
> I think that $1 should be "bbb\"ccc'ddd" and $2 should be "a&b".

The other way round does encode, so the round-trip is broken and this
definitely is a bug:

> ,use (htmlprag)
> (html->sxml "<hr aaa=\"bbb&quot;ccc'ddd\" />")
$1 = (*TOP* (hr (@ (aaa "bbb&quot;ccc'ddd"))))
> (sxml->html '(*TOP* (hr (@ (aaa "bbb&quot;ccc'ddd")))))
$2 = "<hr aaa=\"bbb&quot;ccc'ddd\" />"
> (sxml->html '(*TOP* (hr (@ (aaa "bbb\"ccc'ddd")))))
$3 = "<hr aaa=\"bbb&quot;ccc'ddd\" />"

> (html->sxml (sxml->html '(*TOP* (hr (@ (aaa "bbb\"ccc'ddd"))))))
$4 = (*TOP* (hr (@ (aaa "bbb&quot;ccc'ddd"))))

> I see few ways forward:
>
> 1. Document the current behavior and keep it as it is.
> 2. Add argument #:decode-attributes, defaulting to #f, to the relevant
>    procedures, so that people can opt into the fixed behavior.
> 3. Introduce parameter %decode-attributes, so that people can opt into
>    the fixed behavior.
>
> I am sure there are also other approaches possible.

Since htmlprag already uses parameters for customization
(%strict-tokenizer?), option 3 sounds best to me.

http://git.savannah.nongnu.org/gitweb/?p=guile-lib.git;a=blob;f=src/htmlprag.scm;h=79a7b2f33b0755474bfc015912c01bdf6c676a15;hb=HEAD#l44

(but I’m not the maintainer, so others may have a different opinion)

Can you create a patch?

Best wishes,
Arne
-- 
Unpolitisch sein
heißt politisch sein,
ohne es zu merken.
draketo.de

Attachment: signature.asc
Description: PGP signature

Reply via email to