swedebugia <swedebu...@riseup.net> writes:
> On 2019-01-23 16:58, Ricardo Wurmus wrote: >> >> swedebugia <swedebu...@riseup.net> writes: >> >>>> The second “link” tag opens but is never closed. This may be valid >>>> HTML, but it is not valid XML, which is what xml->sxml expects. >>> >>> Thanks for the quick answer! >>> I will try to remove this line before handling over to the parser. >> >> I would recommend looking for a better source of package information. >> Parsing HTML is not fun and is often brittle. > > I understand. Hm. Will try asking the author. > > Got a little further. Added this: > > (define (sanitize-html html) > "Correct an offending invalid line from the html source" > (let* ((html1 (regexp-substitute #f (string-match "main.css\">" html) > 'pre "main.css\" />" 'post)) > (result (regexp-substitute #f (string-match "utf-8\">" html1) > 'pre "utf-8\" />" 'post))) > result)) It’s generally a bad idea to use regular expressions on HTML or XML. Be careful. > sxml/simple.scm:143:4: In procedure loop: > Throw to key `parser-error' with args `(#<input: string 24fdaf0> > "[wf-entdeclared] broken for " copy)'. I guess this is about the © entity. You may have to tell xml->sxml about these HTML entities. -- Ricardo