Alas, I couldn't think of a really clever way of doing the items table, so I think it'll need a Perl-based solution.
I have a RepairRecord plugin, so I might do a version in that first, and if that goes well I could look at upstreaming a patch. David Cook Senior Software Engineer Prosentient Systems Suite 7.03 6a Glen St Milsons Point NSW 2061 Australia Office: 02 9212 0899 Online: 02 8005 0595 From: Koha-devel <koha-devel-boun...@lists.koha-community.org> On Behalf Of David Cook via Koha-devel Sent: Friday, 12 April 2024 11:36 AM To: 'Koha-devel' <koha-devel@lists.koha-community.org> Subject: [Koha-devel] Finding invalid XML characters in Koha data via SQL Hi all, I just wanted to share a (MariaDB) SQL report that I wrote for finding bib records with invalid XML characters: select biblionumber from biblio_metadata where metadata REGEXP '[^\\x{0009}\\x{000A}\\x{000D}\\x{0020}-\\x{D7FF}\\x{E000}-\\x{FFFD}\\x{1000 0}-\\x{10FFFF}]+'; Newer versions of Koha strip invalid character from the XML so that you can fix your records. I figure this report is very valuable when coupled with that functionality. In fact, I just advised a library today to use them together to fix up some bad data in their catalogue. -- On a related note, I've noticed that you can have a record with good bib XML but invalid item XML, and you won't notice until your record fails to be indexed. So I'm planning on writing a report for that too. I'm thinking it might be good to add these reports to core Koha, so that people can find and fix their own metadata problems. What do people think? David Cook Senior Software Engineer Prosentient Systems Suite 7.03 6a Glen St Milsons Point NSW 2061 Australia Office: 02 9212 0899 Online: 02 8005 0595
_______________________________________________ Koha-devel mailing list Koha-devel@lists.koha-community.org https://lists.koha-community.org/cgi-bin/mailman/listinfo/koha-devel website : https://www.koha-community.org/ git : https://git.koha-community.org/ bugs : https://bugs.koha-community.org/