[ https://issues.apache.org/jira/browse/PDFBOX-5835?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17854313#comment-17854313 ]
Maruan Sahyoun commented on PDFBOX-5835: ---------------------------------------- the schemas being used in the document are parsed dependent on which element type they are part of and handled in the inner class NamespaceFinder. That is done by doing nsfinder.push of the Element being parsed and kept in an internal stack. So if the code doesn't expect a schema included in the structure at that location it's not part of the internal structure and reported as missing later on when it's being used. > DomXmpParser - IllegalArgumentException: prefix cannot be "null" when > creating a QName > -------------------------------------------------------------------------------------- > > Key: PDFBOX-5835 > URL: https://issues.apache.org/jira/browse/PDFBOX-5835 > Project: PDFBox > Issue Type: Bug > Components: XmpBox > Affects Versions: 3.0.2 PDFBox > Reporter: Oliver Schmidtmer > Priority: Major > > I've got a PDF from, where parsing the metadata fails with an > IllegalArgumentException > {code:java} > java.lang.IllegalArgumentException: prefix cannot be "null" when creating a > QName > at java.xml/javax.xml.namespace.QName.<init>(QName.java:192) > at org.apache.xmpbox.xml.DomHelper.getQName(DomHelper.java:99) > at > org.apache.xmpbox.xml.DomXmpParser.parseChildrenAsProperties(DomXmpParser.java:306) > at > org.apache.xmpbox.xml.DomXmpParser.parseDescriptionRoot(DomXmpParser.java:250) > at org.apache.xmpbox.xml.DomXmpParser.parse(DomXmpParser.java:201) > at org.apache.xmpbox.xml.DomXmpParser.parse(DomXmpParser.java:112) > {code} > This can be reproduced with a simple test, using the extracted metadata: > {code:java} > @Test > void testDomXmpParser() throws XmpParsingException > { > // taken from file test-landscape2.pdf > String xmpmeta = "<?xml version=\"1.0\" encoding=\"UTF-8\" > standalone=\"no\"?>\n" + > "<?xpacket begin=\"\uFEFF\" > id=\"W5M0MpCehiHzreSzNTczkc9d\"?><x:xmpmeta xmlns:x=\"adobe:ns:meta/\" > x:xmptk=\"FIS/xee\">\n" + > " <rdf:RDF > xmlns:rdf=\"http://www.w3.org/1999/02/22-rdf-syntax-ns#\">\n" + > " <rdf:Description > xmlns:pdfaid=\"http://www.aiim.org/pdfa/ns/id/\">\n" + > " <pdfaid:part>3</pdfaid:part>\n" + > " <pdfaid:conformance>A</pdfaid:conformance>\n" + > " </rdf:Description>\n" + > " <rdf:Description > xmlns:pdfaExtension=\"http://www.aiim.org/pdfa/ns/extension/\" > xmlns:pdfaField=\"http://www.aiim.org/pdfa/ns/field#\" > xmlns:pdfaProperty=\"http://www.aiim.org/pdfa/ns/property#\" > xmlns:pdfaSchema=\"http://www.aiim.org/pdfa/ns/schema#\" > xmlns:pdfaType=\"http://www.aiim.org/pdfa/ns/type#\" rdf:about=\"\"/>\n" + > " <rdf:Description>\n" + > " <schemas > xmlns=\"http://www.aiim.org/pdfa/ns/extension/\">\n" + > " <rdf:Bag>\n" + > " <rdf:li rdf:parseType=\"Resource\">\n" + > " <schema > xmlns=\"http://www.aiim.org/pdfa/ns/schema#\">ZUGFeRD PDFA Extension > Schema</schema>\n" + > " <namespaceURI > xmlns=\"http://www.aiim.org/pdfa/ns/schema#\">urn:ferd:pdfa:CrossIndustryDocument:invoice:1p0#</namespaceURI>\n" > + > " <prefix > xmlns=\"http://www.aiim.org/pdfa/ns/schema#\">zf</prefix>\n" + > " <property > xmlns=\"http://www.aiim.org/pdfa/ns/schema#\">\n" + > " <rdf:Seq>\n" + > " <rdf:li rdf:parseType=\"Resource\">\n" + > " <name > xmlns=\"http://www.aiim.org/pdfa/ns/property#\">DocumentFileName</name>\n" + > " <valueType > xmlns=\"http://www.aiim.org/pdfa/ns/property#\">Text</valueType>\n" + > " <category > xmlns=\"http://www.aiim.org/pdfa/ns/property#\">external</category>\n" + > " <description > xmlns=\"http://www.aiim.org/pdfa/ns/property#\">name of the embedded XML > invoice file</description>\n" + > " </rdf:li>\n" + > " <rdf:li rdf:parseType=\"Resource\">\n" + > " <name > xmlns=\"http://www.aiim.org/pdfa/ns/property#\">DocumentType</name>\n" + > " <valueType > xmlns=\"http://www.aiim.org/pdfa/ns/property#\">Text</valueType>\n" + > " <category > xmlns=\"http://www.aiim.org/pdfa/ns/property#\">external</category>\n" + > " <description > xmlns=\"http://www.aiim.org/pdfa/ns/property#\">INVOICE</description>\n" + > " </rdf:li>\n" + > " <rdf:li rdf:parseType=\"Resource\">\n" + > " <name > xmlns=\"http://www.aiim.org/pdfa/ns/property#\">Version</name>\n" + > " <valueType > xmlns=\"http://www.aiim.org/pdfa/ns/property#\">Text</valueType>\n" + > " <category > xmlns=\"http://www.aiim.org/pdfa/ns/property#\">external</category>\n" + > " <description > xmlns=\"http://www.aiim.org/pdfa/ns/property#\">The actual version of the > ZUGFeRD data</description>\n" + > " </rdf:li>\n" + > " <rdf:li rdf:parseType=\"Resource\">\n" + > " <name > xmlns=\"http://www.aiim.org/pdfa/ns/property#\">ConformanceLevel</name>\n" + > " <valueType > xmlns=\"http://www.aiim.org/pdfa/ns/property#\">Text</valueType>\n" + > " <category > xmlns=\"http://www.aiim.org/pdfa/ns/property#\">external</category>\n" + > " <description > xmlns=\"http://www.aiim.org/pdfa/ns/property#\">The conformance level of the > ZUGFeRD data</description>\n" + > " </rdf:li>\n" + > " </rdf:Seq>\n" + > " </property>\n" + > " </rdf:li>\n" + > " </rdf:Bag>\n" + > " </schemas>\n" + > " </rdf:Description>\n" + > " <rdf:Description > xmlns:zf=\"urn:ferd:pdfa:CrossIndustryDocument:invoice:1p0#\" rdf:about=\"\" > zf:ConformanceLevel=\"EXTENDED\" zf:DocumentFileName=\"ZUGFeRD-invoice.xml\" > zf:DocumentType=\"INVOICE\" zf:Version=\"1.0\"/>\n" + > " </rdf:RDF>\n" + > "</x:xmpmeta><?xpacket end=\"w\"?>\n"; > DomXmpParser xmpParser = new DomXmpParser(); > xmpParser.setStrictParsing(false); > XMPMetadata xmp = xmpParser.parse(xmpmeta.getBytes()); > } > {code} -- This message was sent by Atlassian Jira (v8.20.10#820010) --------------------------------------------------------------------- To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org For additional commands, e-mail: dev-h...@pdfbox.apache.org