Hi Tomas, [...]
> It already modifies the raw value for regular HTML text: > > scheme@(htmlprag)> (html->sxml "a&b") > $10 = (*TOP* "a&b") > scheme@(htmlprag)> (sxml->html '(*TOP* "a&b")) > $13 = "a&b" > > > I now noticed this also affect encoding: > > scheme@(htmlprag)> (sxml->html '(*TOP* (a (@ (href "a&b"))))) > $12 = "<a href=\"a&b\"></a>" > > > I am not sure why attributes should be special here. > > For what it is worth, (sxml simple) itself decodes even attributes: > > scheme@(htmlprag)> (xml->sxml "<a href=\"a&b\"></a>") > $11 = (*TOP* (a (@ (href "a&b")))) > > For comparison, Firefox seems to decode the attributes as well even in > HTML. That is actually how I discovered this issue, links I extracted > from <a href=".."> using html->sxml were not working until I ran a > decoding pass on them. Good points. Thanks for these. >> Users may haev different use cases requiring to apply different >> transformation themselves? > > I agree in the abstract, but do you have any specific use case in mind > when you would want to use the raw content of attributes (especially > since you already cannot get raw content of text nodes). >> If we hard-code a decoding scheme ourselves, then force that choice >> onto users, no? > > I agree we cannot hard-code or change it now due to compatibility > concerns, but adding #:decode-attributes to html->sxml, > #:encode-attributes to sxml->html and possibly %deencode-attributes? > parameter, in the spirit of %strict-tokenizer? would seem reasonable. I see this situation and %strict-tokenizer as a bit different; the htmlprag module was designed to be lenient, so being lenient could not really be considered a bug :-). But this here could well be considered a bug. So perhaps something we could do is fix this correctly, and bump at least the minor digit in our version (we're still in an unstable 0 version (last one was 0.2.8.1), so technically we don't promise stability yet (perhaps never, as this guile-lib project aims to be a lab for components that could later be included in Guile). But we should communicate this change well in the NEWS file. -- Thanks, Maxim