[ https://issues.apache.org/jira/browse/TIKA-1353?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14042352#comment-14042352 ]
Hudson commented on TIKA-1353: ------------------------------ SUCCESS: Integrated in tika-trunk-jdk1.7 #64 (See [https://builds.apache.org/job/tika-trunk-jdk1.7/64/]) TIKA-1353 If a File is available, parse ODF documents with it, so that the metadata can always be processed first (nick: http://svn.apache.org/viewvc/tika/trunk/?view=rev&rev=1605124) * /tika/trunk/tika-parsers/src/main/java/org/apache/tika/parser/odf/OpenDocumentParser.java * /tika/trunk/tika-parsers/src/test/java/org/apache/tika/parser/odf/ODFParserTest.java > OpenDocumentParser doesn't correctly process metadata > ----------------------------------------------------- > > Key: TIKA-1353 > URL: https://issues.apache.org/jira/browse/TIKA-1353 > Project: Tika > Issue Type: Bug > Components: metadata, parser > Affects Versions: 1.5 > Reporter: Steve R > Fix For: 1.6 > > Original Estimate: 24h > Remaining Estimate: 24h > > When using OpenDocumentParser, the metadata isn't set correctly. When using > it to write an html file, the only metadata that it knows about is content > type because it is set ahead of time. > The problem is that when iterating over the zip contents, meta.xml isn't > processed before content.xml. The metadata set on the parse object is correct > after parse() returns, however the contents of the resulting html file is > missing all of the metadata. > Changing the code to be > boolean parsedMetaData = false; > boolean delayLoadContent = false; > while (entry != null) { > ... > } else if (entry.getName().equals("meta.xml")) { > meta.parse(zip, new DefaultHandler(), metadata, context); > parsedMetaData = true; > if (delayLoadContent) { > if (content instanceof OpenDocumentContentParser) { > ((OpenDocumentContentParser) > content).parseInternal(zip, handler, metadata, context); > } else { > // Foreign content parser was set: > content.parse(zip, handler, metadata, context); > } > } > } else if (entry.getName().endsWith("content.xml")) { > if (!parsedMetaData) { > delayLoadContent = true; > } else { > if (content instanceof OpenDocumentContentParser) { > ((OpenDocumentContentParser) > content).parseInternal(zip, handler, metadata, context); > } else { > // Foreign content parser was set: > content.parse(zip, handler, metadata, context); > } > } > } > works as expected. -- This message was sent by Atlassian JIRA (v6.2#6252)