Tomas Volf <~@wolfsden.cz> writes: > I think I found a bug in the htmlprag module in guile-lib. When parsing > attributes, the values are not properly decoded:
Thank you for the report! > --8<---------------cut here---------------start------------->8--- > scheme@(guile-user)> ,use (htmlprag) > scheme@(guile-user)> (html->sxml "<hr aaa=\"bbb"ccc'ddd\" />") > $1 = (*TOP* (hr (@ (aaa "bbb"ccc'ddd")))) > scheme@(guile-user)> (html->sxml "<a href=\"a&b\" />") > $2 = (*TOP* (a (@ (href "a&b")))) > --8<---------------cut here---------------end--------------->8--- > > I think that $1 should be "bbb\"ccc'ddd" and $2 should be "a&b". The other way round does encode, so the round-trip is broken and this definitely is a bug: > ,use (htmlprag) > (html->sxml "<hr aaa=\"bbb"ccc'ddd\" />") $1 = (*TOP* (hr (@ (aaa "bbb"ccc'ddd")))) > (sxml->html '(*TOP* (hr (@ (aaa "bbb"ccc'ddd"))))) $2 = "<hr aaa=\"bbb"ccc'ddd\" />" > (sxml->html '(*TOP* (hr (@ (aaa "bbb\"ccc'ddd"))))) $3 = "<hr aaa=\"bbb"ccc'ddd\" />" > (html->sxml (sxml->html '(*TOP* (hr (@ (aaa "bbb\"ccc'ddd")))))) $4 = (*TOP* (hr (@ (aaa "bbb"ccc'ddd")))) > I see few ways forward: > > 1. Document the current behavior and keep it as it is. > 2. Add argument #:decode-attributes, defaulting to #f, to the relevant > procedures, so that people can opt into the fixed behavior. > 3. Introduce parameter %decode-attributes, so that people can opt into > the fixed behavior. > > I am sure there are also other approaches possible. Since htmlprag already uses parameters for customization (%strict-tokenizer?), option 3 sounds best to me. http://git.savannah.nongnu.org/gitweb/?p=guile-lib.git;a=blob;f=src/htmlprag.scm;h=79a7b2f33b0755474bfc015912c01bdf6c676a15;hb=HEAD#l44 (but I’m not the maintainer, so others may have a different opinion) Can you create a patch? Best wishes, Arne -- Unpolitisch sein heißt politisch sein, ohne es zu merken. draketo.de
signature.asc
Description: PGP signature