2016-02-12 17:53 GMT+01:00 Edson Richter <edsonrich...@hotmail.com>: > > 2016-02-12 1:53 GMT+01:00 Edson Richter <edsonrich...@hotmail.com>: > >> Hi! >> >> I've some (about 1M records) containing legacy XML I would like to parse >> and apply XMLEXISTS. >> >> This is the query: >> >> select * from xmllog >> where xpath_exists(('//MyDocument[@DocNum = ''000411828'']'::text), >> xmlparse(document cdataout)); >> >> This is the error: >> >> ERRO: could not parse XML document >> SQL state: 2200M >> Detail: line 2: xmlns:leg: 'LEGACYAPP - SEND MSG EVENTS ABOUT' is not a >> valid URI >> <leg:sendmsgeventsabout xmlns:leg="LEGACYAPP - SEND MSG EVENTS ABOUT"> >> ^ >> >> This is the sample XML with malformed xmlns (I've shortenet the data, but >> the important thing here is the malformed xmlns): >> >> "<?xml version="1.0" encoding="utf-8" ?> >> <leg:sendmsgeventsabout xmlns:leg="LEGACYAPP - SEND MSG EVENTS ABOUT"> >> <carrier xmlns="" controlnum="04503660000146"> >> <MyDocument DocNum="000511852"> >> <other_info>0</other_info> >> <complement info (...)" >> >> >> I can easly read this XML in Notepad++, and also in Java - but PostgreSQL >> always throw error. >> >> Can you plase tell me how can make PostgreSQL ignore this malformed xmlns >> and proceed processing the XML? >> > > PostgreSQL uses libxml2, but the usage isn't too configurable. So my > advice is using defensive strategy and clean/fix wrong namespace with > string tools - replace function. > > Regards > > Pavel > > > Thanks, Pavel. > > I did suspect that. But then I have about 10.000 new records each week, > and I've no control over the system that generates it. > It is a shame, but sometimes we have to live with such problems. >
I understand - you can handle this error and broken xml you can ignore. Or you can use a parser from Python, Perl - PLPerlu or PLPythonu is great for it. Regards Pavel > > Regards, > > Edson Richter > > >