Bug#976884: UDD: encoding issue in bug titles

2020-12-08 Thread Lucas Nussbaum
Package: qa.debian.org
Severity: normal
User: qa.debian@packages.debian.org
Usertags: udd

19:36 < jcristau> looks like something's messing up my bug title in 
  
https://buildd.debian.org/status/package.php?p=firefox&suite=sid
20:14 < bunk> string sanitizing to prevent exploits through bug titles?
20:34 < aurel32> that or encoding issues
20:34 < aurel32> it's taken from udd
20:38 < aurel32> and there is the same issue in udd, at least in the web 
interface
21:16 < aurel32> and udd returns:  976731 | firefox: FTBFS on arm64 (explicit 
specialization 
 in non-namespace scope â\u0080\u0098class 
 js::wasm::BaseRegAllocâ\u0080\u0099)
21:19 < aurel32> so this just looks like a UDD issue when importing the data



Bug#976884: UDD: encoding issue in bug titles

2020-12-08 Thread Felix Lechner
Hi,

> â\u0080\u0098class

I had a very similar issue in Lintian's database a few weeks ago. It
was caused by uploading (properly UTF-8 encoded) JSON to Postgres. The
Perl driver DBD::Pg encoded the data again and picked the 7-bit clean
escape sequences according to RFC4627, which is what I believe you are
seeing. They are further described here:

https://metacpan.org/pod/JSON::PP#ascii

My solution was to disable the automatic decoding layer in DBD::Pg via
'pg_enable_utf8 => 0'. It mirrors how I handle encoding elsewhere
(i.e. explicitly and without PerlIO, Bug#972878) and works great, but
did not enjoy much support on IRC in either the Perl or the Postgres
communities..

Kind regards
Felix Lechner