Le ven. 3 sept. 2021 à 01:03, Jonas Smedegaard <jo...@jones.dk> a écrit : > > Quoting Bastien Roucariès (2021-09-02 23:45:30) > > Perl is an option I implemented the privacy breach test in perl. The > > problem is I prefer to drop a debian/package.privacy.xslt file in the > > package instead of asking maintainer to code the removal of privacy > > problems... > > > > Generic one could be coded in perl, but for the end side I need > > something like xslt2 > > If you are asking how to sloppily parse HTML5 files from upstream source > and XSLT2 files provided by package maintainers, then with perl you > could use HTML::HTML5::Parser for the first and XML::Saxon::XSLT2 for > the second.
Unfortunatly HTML::HTML5::Parser is RC buggy since 4 years due to a bug for handling UTF-8 (#750946) https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=750946 Your suggestion will work fine but we need to get some solution for this utf-8 problem... Bastien > > > > I am sure Python/Ruby/PHP/Haskell/Scheme/Rust/etc. folks will argue > > > that their pet language is the right for the task as well: I think > > > it will help the conversation if you clarify what you are open to > > > and what are constraints for you. > > > > > > E.g. do you mean that it *must* be JavaScript when you mention that? > > > Or are you perhaps asking if someone else wants to take over the > > > challenge from you, so it does not matter how it is done? > > > > No it must no be javascript, but using V8 or something like browser > > internal in order to fail to get a dom tree in case of broken html > > file, like a browser do. But may be I am overconcious > > If you are asking how to parse HTML5 files like a web browser, then with > perl you could use Gtk3::WebKit2 for that. > > > - Jonas > > -- > * Jonas Smedegaard - idealist & Internet-arkitekt > * Tlf.: +45 40843136 Website: http://dr.jones.dk/ > > [x] quote me freely [ ] ask before reusing [ ] keep private