Hello,

I think I found a bug in the htmlprag module in guile-lib.  When parsing
attributes, the values are not properly decoded:

--8<---------------cut here---------------start------------->8---
scheme@(guile-user)> ,use (htmlprag)
scheme@(guile-user)> (html->sxml "<hr aaa=\"bbb&quot;ccc'ddd\" />")
$1 = (*TOP* (hr (@ (aaa "bbb&quot;ccc'ddd"))))
scheme@(guile-user)> (html->sxml "<a href=\"a&amp;b\" />")
$2 = (*TOP* (a (@ (href "a&amp;b"))))
--8<---------------cut here---------------end--------------->8---

I think that $1 should be "bbb\"ccc'ddd" and $2 should be "a&b".

The annoying part is that this cannot really be changed now, because
people (me included) already have workarounds in place, and
automatically decoding now would lead to double decoding.

I see few ways forward:

1. Document the current behavior and keep it as it is.
2. Add argument #:decode-attributes, defaulting to #f, to the relevant
   procedures, so that people can opt into the fixed behavior.
3. Introduce parameter %decode-attributes, so that people can opt into
   the fixed behavior.

I am sure there are also other approaches possible.

Have a nice day,
Tomas

-- 
There are only two hard things in Computer Science:
cache invalidation, naming things and off-by-one errors.



Reply via email to