Quoting Bastien ROUCARIES (2021-09-04 20:28:49) > Le sam. 4 sept. 2021 à 18:03, Jonas Smedegaard <jo...@jones.dk> a écrit : > > > > Quoting Bastien ROUCARIES (2021-09-04 19:52:50) > > > Le ven. 3 sept. 2021 à 01:03, Jonas Smedegaard <jo...@jones.dk> a écrit : > > > > > > > > Quoting Bastien Roucariès (2021-09-02 23:45:30) > > > > > Perl is an option I implemented the privacy breach test in perl. The > > > > > problem is I prefer to drop a debian/package.privacy.xslt file in the > > > > > package instead of asking maintainer to code the removal of privacy > > > > > problems... > > > > > > > > > > Generic one could be coded in perl, but for the end side I need > > > > > something like xslt2 > > > > > > > > If you are asking how to sloppily parse HTML5 files from upstream source > > > > and XSLT2 files provided by package maintainers, then with perl you > > > > could use HTML::HTML5::Parser for the first and XML::Saxon::XSLT2 for > > > > the second. > > > > > > Unfortunatly HTML::HTML5::Parser is RC buggy since 4 years due to a > > > bug for handling UTF-8 (#750946) > > > https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=750946 > > > > Ouch! > > > > I keep forgetting which packages are affected by that annoying bug :-/ > > > > > > > Your suggestion will work fine but we need to get some solution for > > > this utf-8 problem... > > > > I have recently grown somewhat more familiar with UTF-8 and perl (in my > > work towards fixing bug#867305 in licensecheck), and will try take a > > fresh look at bug#750946... > > The solution is straightforward just send you a mail. Use html5 > sniffing and add an optional parameter to method to specify encoding.
Seems to me - and seems from your posts to upstream bugreport that you agree - that a "straightforward" solution breaks the API, whereas a solution which preserves the API is hard. It is my understanding that upstream would considers the API being tied to the API - i.e. if you want a different API then look for different module. Therefore: How do you think about instead using HTML5::DOM? It is not yet in Debian so will need a "sudo apt install cpanminus; cpanm HTML5::DOM". If useful then I can offer to package it for Debian. - Jonas -- * Jonas Smedegaard - idealist & Internet-arkitekt * Tlf.: +45 40843136 Website: http://dr.jones.dk/ [x] quote me freely [ ] ask before reusing [ ] keep private
signature.asc
Description: signature