[ 
https://issues.apache.org/jira/browse/PDFBOX-5835?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17854313#comment-17854313
 ] 

Maruan Sahyoun commented on PDFBOX-5835:
----------------------------------------

the schemas being used in the document are parsed dependent on which element 
type they are part of and handled in the inner class NamespaceFinder. That is 
done by doing nsfinder.push of the Element being parsed and kept in an internal 
stack. So if the code doesn't expect a schema included in the structure at that 
location it's not part of the internal structure and reported as missing later 
on when it's being used.  

> DomXmpParser - IllegalArgumentException: prefix cannot be "null" when 
> creating a QName
> --------------------------------------------------------------------------------------
>
>                 Key: PDFBOX-5835
>                 URL: https://issues.apache.org/jira/browse/PDFBOX-5835
>             Project: PDFBox
>          Issue Type: Bug
>          Components: XmpBox
>    Affects Versions: 3.0.2 PDFBox
>            Reporter: Oliver Schmidtmer
>            Priority: Major
>
> I've got a PDF from, where parsing the metadata fails with an 
> IllegalArgumentException
> {code:java}
> java.lang.IllegalArgumentException: prefix cannot be "null" when creating a 
> QName
>       at java.xml/javax.xml.namespace.QName.<init>(QName.java:192)
>       at org.apache.xmpbox.xml.DomHelper.getQName(DomHelper.java:99)
>       at 
> org.apache.xmpbox.xml.DomXmpParser.parseChildrenAsProperties(DomXmpParser.java:306)
>       at 
> org.apache.xmpbox.xml.DomXmpParser.parseDescriptionRoot(DomXmpParser.java:250)
>       at org.apache.xmpbox.xml.DomXmpParser.parse(DomXmpParser.java:201)
>       at org.apache.xmpbox.xml.DomXmpParser.parse(DomXmpParser.java:112)
> {code}
> This can be reproduced with a simple test, using the extracted metadata:
> {code:java}
>     @Test
>     void testDomXmpParser() throws XmpParsingException
>     {
>         // taken from file test-landscape2.pdf
>         String xmpmeta = "<?xml version=\"1.0\" encoding=\"UTF-8\" 
> standalone=\"no\"?>\n" +
>                 "<?xpacket begin=\"\uFEFF\" 
> id=\"W5M0MpCehiHzreSzNTczkc9d\"?><x:xmpmeta xmlns:x=\"adobe:ns:meta/\" 
> x:xmptk=\"FIS/xee\">\n" +
>                 " <rdf:RDF 
> xmlns:rdf=\"http://www.w3.org/1999/02/22-rdf-syntax-ns#\";>\n" +
>                 " <rdf:Description 
> xmlns:pdfaid=\"http://www.aiim.org/pdfa/ns/id/\";>\n" +
>                 "   <pdfaid:part>3</pdfaid:part>\n" +
>                 "   <pdfaid:conformance>A</pdfaid:conformance>\n" +
>                 "  </rdf:Description>\n" +
>                 "  <rdf:Description 
> xmlns:pdfaExtension=\"http://www.aiim.org/pdfa/ns/extension/\"; 
> xmlns:pdfaField=\"http://www.aiim.org/pdfa/ns/field#\"; 
> xmlns:pdfaProperty=\"http://www.aiim.org/pdfa/ns/property#\"; 
> xmlns:pdfaSchema=\"http://www.aiim.org/pdfa/ns/schema#\"; 
> xmlns:pdfaType=\"http://www.aiim.org/pdfa/ns/type#\"; rdf:about=\"\"/>\n" +
>                 "  <rdf:Description>\n" +
>                 "   <schemas 
> xmlns=\"http://www.aiim.org/pdfa/ns/extension/\";>\n" +
>                 "    <rdf:Bag>\n" +
>                 "     <rdf:li rdf:parseType=\"Resource\">\n" +
>                 "      <schema 
> xmlns=\"http://www.aiim.org/pdfa/ns/schema#\";>ZUGFeRD PDFA Extension 
> Schema</schema>\n" +
>                 "      <namespaceURI 
> xmlns=\"http://www.aiim.org/pdfa/ns/schema#\";>urn:ferd:pdfa:CrossIndustryDocument:invoice:1p0#</namespaceURI>\n"
>  +
>                 "      <prefix 
> xmlns=\"http://www.aiim.org/pdfa/ns/schema#\";>zf</prefix>\n" +
>                 "      <property 
> xmlns=\"http://www.aiim.org/pdfa/ns/schema#\";>\n" +
>                 "       <rdf:Seq>\n" +
>                 "        <rdf:li rdf:parseType=\"Resource\">\n" +
>                 "         <name 
> xmlns=\"http://www.aiim.org/pdfa/ns/property#\";>DocumentFileName</name>\n" +
>                 "         <valueType 
> xmlns=\"http://www.aiim.org/pdfa/ns/property#\";>Text</valueType>\n" +
>                 "         <category 
> xmlns=\"http://www.aiim.org/pdfa/ns/property#\";>external</category>\n" +
>                 "         <description 
> xmlns=\"http://www.aiim.org/pdfa/ns/property#\";>name of the embedded XML 
> invoice file</description>\n" +
>                 "        </rdf:li>\n" +
>                 "        <rdf:li rdf:parseType=\"Resource\">\n" +
>                 "         <name 
> xmlns=\"http://www.aiim.org/pdfa/ns/property#\";>DocumentType</name>\n" +
>                 "         <valueType 
> xmlns=\"http://www.aiim.org/pdfa/ns/property#\";>Text</valueType>\n" +
>                 "         <category 
> xmlns=\"http://www.aiim.org/pdfa/ns/property#\";>external</category>\n" +
>                 "         <description 
> xmlns=\"http://www.aiim.org/pdfa/ns/property#\";>INVOICE</description>\n" +
>                 "        </rdf:li>\n" +
>                 "        <rdf:li rdf:parseType=\"Resource\">\n" +
>                 "         <name 
> xmlns=\"http://www.aiim.org/pdfa/ns/property#\";>Version</name>\n" +
>                 "         <valueType 
> xmlns=\"http://www.aiim.org/pdfa/ns/property#\";>Text</valueType>\n" +
>                 "         <category 
> xmlns=\"http://www.aiim.org/pdfa/ns/property#\";>external</category>\n" +
>                 "         <description 
> xmlns=\"http://www.aiim.org/pdfa/ns/property#\";>The actual version of the 
> ZUGFeRD data</description>\n" +
>                 "        </rdf:li>\n" +
>                 "        <rdf:li rdf:parseType=\"Resource\">\n" +
>                 "         <name 
> xmlns=\"http://www.aiim.org/pdfa/ns/property#\";>ConformanceLevel</name>\n" +
>                 "         <valueType 
> xmlns=\"http://www.aiim.org/pdfa/ns/property#\";>Text</valueType>\n" +
>                 "         <category 
> xmlns=\"http://www.aiim.org/pdfa/ns/property#\";>external</category>\n" +
>                 "         <description 
> xmlns=\"http://www.aiim.org/pdfa/ns/property#\";>The conformance level of the 
> ZUGFeRD data</description>\n" +
>                 "        </rdf:li>\n" +
>                 "       </rdf:Seq>\n" +
>                 "      </property>\n" +
>                 "     </rdf:li>\n" +
>                 "    </rdf:Bag>\n" +
>                 "   </schemas>\n" +
>                 "  </rdf:Description>\n" +
>                 "  <rdf:Description 
> xmlns:zf=\"urn:ferd:pdfa:CrossIndustryDocument:invoice:1p0#\" rdf:about=\"\" 
> zf:ConformanceLevel=\"EXTENDED\" zf:DocumentFileName=\"ZUGFeRD-invoice.xml\" 
> zf:DocumentType=\"INVOICE\" zf:Version=\"1.0\"/>\n" +
>                 " </rdf:RDF>\n" +
>                 "</x:xmpmeta><?xpacket end=\"w\"?>\n";
>         DomXmpParser xmpParser = new DomXmpParser();
>         xmpParser.setStrictParsing(false);
>         XMPMetadata xmp = xmpParser.parse(xmpmeta.getBytes());
>     }
> {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org

Reply via email to