[ https://issues.apache.org/jira/browse/TIKA-1133?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Ray Gauss II resolved TIKA-1133. -------------------------------- Resolution: Fixed Fix Version/s: 1.4 Resolved in r1491680. > Ability to Allow Empty and Duplicate Tika Values for XML Elements > ----------------------------------------------------------------- > > Key: TIKA-1133 > URL: https://issues.apache.org/jira/browse/TIKA-1133 > Project: Tika > Issue Type: Improvement > Components: parser > Affects Versions: 1.3 > Reporter: Ray Gauss II > Assignee: Ray Gauss II > Fix For: 1.4 > > > In some cases it is beneficial to allow empty and duplicate Tika metadata > values for multi-valued XML elements like RDF bags. > Consider an example where the original source metadata is structured > something like: > {code} > <Person> > <FirstName>John</FirstName> > <LastName>Smith</FirstName> > </Person> > <Person> > <FirstName>Jane</FirstName> > <LastName>Doe</FirstName> > </Person> > <Person> > <FirstName>Bob</FirstName> > </Person> > <Person> > <FirstName>Kate</FirstName> > <LastName>Smith</FirstName> > </Person> > {code} > and since Tika stores only flat metadata we transform that before invoking a > parser to something like: > {code} > <custom:FirstName> > <rdf:Bag> > <rdf:li>John</rdf:li> > <rdf:li>Jane</rdf:li> > <rdf:li>Bob</rdf:li> > <rdf:li>Kate</rdf:li> > </rdf:Bag> > </custom:FirstName> > <custom:LastName> > <rdf:Bag> > <rdf:li>Smith</rdf:li> > <rdf:li>Doe</rdf:li> > <rdf:li></rdf:li> > <rdf:li>Smith</rdf:li> > </rdf:Bag> > </custom:LastName> > {code} > The current behavior ignores empties and duplicates and we don't know if Bob > or Kate ever had last names. Empties or duplicates in other positions result > in an incorrect mapping of data. > We should allow the option to create an {{ElementMetadataHandler}} which > allows empty and/or duplicate values. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira