Hi All, I am using hivexmlserde-1.0.5.1.jar to load xml data into hive table. Following is the table DDL
create external table h_xml (member_id string,personal_identity map<string,string>,personal_contact map<string,string>,Enrollment map<string,string> ) row format serde 'com.ibm.spss.hive.serde2.xml.XmlSerDe' with SERDEPROPERTIES ( "column.xpath.member_id"="/PayLoad/Members/Member/MemberID/text()", "column.xpath.personal_identity"="/PayLoad/Members/Member/Demographics/PersonIdentity", "column.xpath.personal_contact"="/PayLoad/Members/Member/Demographics/PersonContactInformation", "column.xpath.Enrollment"="/PayLoad/Members/Member/Enrollment" ) STORED AS INPUTFORMAT 'com.ibm.spss.hive.serde2.xml.XmlInputFormat' OUTPUTFORMAT 'org.apache.hadoop.hive.ql.io.IgnoreKeyTextOutputFormat' TBLPROPERTIES ( "xmlinput.start"="<PayLoad", "xmlinput.end"="</PayLoad>" ) the sample xml is attached in the email. Problem:- When I am executing the below query select * from h_xml limit 1 ; the result shows me member_id of multiple nodes in the same row hive> select * from h_xml ; OK <string>02573767-0502573768-01</string> {"PersonIdentity":"<string><SSN> in fact they are two different memberId and should show up in 2 rows.
<PayLoad> <Members> <Member> <MemberID>02573767-05</MemberID> <ClientCode>AHPPOSI</ClientCode> <Demographics> <PersonIdentity> <SSN>XXXXXX</SSN> <Firstname>YYYY</Firstname> <LastName>ZZZZ</LastName> <DateOfBirth>9999-01-01</DateOfBirth> <Gender>F</Gender> </PersonIdentity> <PersonContactInformation> <Address usage="Primary" addressID="1"> <AddressLineOne>ayz street</AddressLineOne> <City>OOOOO</City> <StateorProvince>OO</StateorProvince> <PostalCode>00000</PostalCode> <Country>United States</Country> <isPreferred>true</isPreferred> </Address> <Telephone usage="Primary" associatedAddress="1"> <AreaCode>000</AreaCode> <PhoneNumber>00000000</PhoneNumber> <isPreferred>true</isPreferred> </Telephone> </PersonContactInformation> </Demographics> <Enrollment> <GroupID>9898989</GroupID> <GroupName>PPPPPPPP</GroupName> <ResponsiblePartyRelationshipCode>DEPENDANT</ResponsiblePartyRelationshipCode> <IsPrimary>true</IsPrimary> </Enrollment> </Member> <Member> <MemberID>02573768-01</MemberID> <ClientCode>CODE</ClientCode> <Demographics> <PersonIdentity> <SSN>XXXXXXXX</SSN> <Firstname>TTTTT</Firstname> <LastName>PPPPP</LastName> <DateOfBirth>1999-01-01</DateOfBirth> <Gender>F</Gender> </PersonIdentity> <PersonContactInformation> <Address usage="Primary" addressID="1"> <AddressLineOne>1111 street</AddressLineOne> <City>town</City> <StateorProvince>PP</StateorProvince> <PostalCode>00000</PostalCode> <Country>United States</Country> <isPreferred>true</isPreferred> </Address> <Telephone usage="Primary" associatedAddress="1"> <AreaCode>333333</AreaCode> <PhoneNumber>989898</PhoneNumber> <isPreferred>true</isPreferred> </Telephone> </PersonContactInformation> </Demographics> <Enrollment> <GroupID>11111</GroupID> <GroupName>PPPPPPP</GroupName> <ResponsiblePartyRelationshipCode>SELF</ResponsiblePartyRelationshipCode> <IsPrimary>true</IsPrimary> </Enrollment> </Member> </Members> </PayLoad>