[ 
https://issues.apache.org/jira/browse/TIKA-1133?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ray Gauss II resolved TIKA-1133.
--------------------------------

       Resolution: Fixed
    Fix Version/s: 1.4

Resolved in r1491680.
                
> Ability to Allow Empty and Duplicate Tika Values for XML Elements
> -----------------------------------------------------------------
>
>                 Key: TIKA-1133
>                 URL: https://issues.apache.org/jira/browse/TIKA-1133
>             Project: Tika
>          Issue Type: Improvement
>          Components: parser
>    Affects Versions: 1.3
>            Reporter: Ray Gauss II
>            Assignee: Ray Gauss II
>             Fix For: 1.4
>
>
> In some cases it is beneficial to allow empty and duplicate Tika metadata 
> values for multi-valued XML elements like RDF bags.
> Consider an example where the original source metadata is structured 
> something like:
> {code}
> <Person>
>   <FirstName>John</FirstName>
>   <LastName>Smith</FirstName>
> </Person>
> <Person>
>   <FirstName>Jane</FirstName>
>   <LastName>Doe</FirstName>
> </Person>
> <Person>
>   <FirstName>Bob</FirstName>
> </Person>
> <Person>
>   <FirstName>Kate</FirstName>
>   <LastName>Smith</FirstName>
> </Person>
> {code}
> and since Tika stores only flat metadata we transform that before invoking a 
> parser to something like:
> {code}
>  <custom:FirstName>
>   <rdf:Bag>
>    <rdf:li>John</rdf:li>
>    <rdf:li>Jane</rdf:li>
>    <rdf:li>Bob</rdf:li>
>    <rdf:li>Kate</rdf:li>
>   </rdf:Bag>
>  </custom:FirstName>
>  <custom:LastName>
>   <rdf:Bag>
>    <rdf:li>Smith</rdf:li>
>    <rdf:li>Doe</rdf:li>
>    <rdf:li></rdf:li>
>    <rdf:li>Smith</rdf:li>
>   </rdf:Bag>
>  </custom:LastName>
> {code}
> The current behavior ignores empties and duplicates and we don't know if Bob 
> or Kate ever had last names.  Empties or duplicates in other positions result 
> in an incorrect mapping of data.
> We should allow the option to create an {{ElementMetadataHandler}} which 
> allows empty and/or duplicate values.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Reply via email to