Re: [Koha-devel] Finding invalid XML characters in Koha data via SQL

Marcel de Rooy via Koha-devel Thu, 11 Apr 2024 23:03:42 -0700

+1
________________________________
Van: Koha-devel <koha-devel-boun...@lists.koha-community.org> namens David Cook 
via Koha-devel <koha-devel@lists.koha-community.org>
Verzonden: vrijdag 12 april 2024 03:36
Aan: 'Koha-devel' <koha-devel@lists.koha-community.org>
Onderwerp: [Koha-devel] Finding invalid XML characters in Koha data via SQL



Hi all,



I just wanted to share a (MariaDB) SQL report that I wrote for finding bib 
records with invalid XML characters:

select biblionumber from biblio_metadata where metadata REGEXP 
'[^\\x{0009}\\x{000A}\\x{000D}\\x{0020}-\\x{D7FF}\\x{E000}-\\x{FFFD}\\x{10000}-\\x{10FFFF}]+';



Newer versions of Koha strip invalid character from the XML so that you can fix 
your records. I figure this report is very valuable when coupled with that 
functionality. In fact, I just advised a library today to use them together to 
fix up some bad data in their catalogue.



--



On a related note, I’ve noticed that you can have a record with good bib XML 
but invalid item XML, and you won’t notice until your record fails to be 
indexed. So I’m planning on writing a report for that too.



I’m thinking it might be good to add these reports to core Koha, so that people 
can find and fix their own metadata problems. What do people think?



David Cook

Senior Software Engineer

Prosentient Systems

Suite 7.03

6a Glen St

Milsons Point NSW 2061

Australia



Office: 02 9212 0899

Online: 02 8005 0595

_______________________________________________
Koha-devel mailing list
Koha-devel@lists.koha-community.org
https://lists.koha-community.org/cgi-bin/mailman/listinfo/koha-devel
website : https://www.koha-community.org/
git : https://git.koha-community.org/
bugs : https://bugs.koha-community.org/

Re: [Koha-devel] Finding invalid XML characters in Koha data via SQL

Reply via email to