Hi Tomas,

[...]

> It already modifies the raw value for regular HTML text:
>
> scheme@(htmlprag)> (html->sxml "a&b")
> $10 = (*TOP* "a&b")
> scheme@(htmlprag)> (sxml->html '(*TOP* "a&b"))
> $13 = "a&b"
>
>
> I now noticed this also affect encoding:
>
> scheme@(htmlprag)> (sxml->html '(*TOP* (a (@ (href "a&b")))))
> $12 = "<a href=\"a&b\"></a>"
>
>
> I am not sure why attributes should be special here.
>
> For what it is worth, (sxml simple) itself decodes even attributes:
>
> scheme@(htmlprag)> (xml->sxml "<a href=\"a&amp;b\"></a>")
> $11 = (*TOP* (a (@ (href "a&b"))))
>
> For comparison, Firefox seems to decode the attributes as well even in
> HTML.  That is actually how I discovered this issue, links I extracted
> from <a href=".."> using html->sxml were not working until I ran a
> decoding pass on them.

Good points.  Thanks for these.

>> Users may haev different use cases requiring to apply different
>> transformation themselves?
>
> I agree in the abstract, but do you have any specific use case in mind
> when you would want to use the raw content of attributes (especially
> since you already cannot get raw content of text nodes).

>> If we hard-code a decoding scheme ourselves, then force that choice
>> onto users, no?
>
> I agree we cannot hard-code or change it now due to compatibility
> concerns, but adding #:decode-attributes to html->sxml,
> #:encode-attributes to sxml->html and possibly %deencode-attributes?
> parameter, in the spirit of %strict-tokenizer? would seem reasonable.

I see this situation and %strict-tokenizer as a bit different; the
htmlprag module was designed to be lenient, so being lenient could not
really be considered a bug :-).  But this here could well be considered
a bug.  So perhaps something we could do is fix this correctly, and bump
at least the minor digit in our version (we're still in an unstable 0
version (last one was 0.2.8.1), so technically we don't promise
stability yet (perhaps never, as this guile-lib project aims to be a lab
for components that could later be included in Guile).  But we should
communicate this change well in the NEWS file.

-- 
Thanks,
Maxim



Reply via email to