Hi All,

I am using hivexmlserde-1.0.5.1.jar to load xml data into hive table.
Following is the table DDL

create external table h_xml (member_id string,personal_identity
map<string,string>,personal_contact map<string,string>,Enrollment
map<string,string> )
row format serde 'com.ibm.spss.hive.serde2.xml.XmlSerDe'
with SERDEPROPERTIES (
"column.xpath.member_id"="/PayLoad/Members/Member/MemberID/text()",
"column.xpath.personal_identity"="/PayLoad/Members/Member/Demographics/PersonIdentity",
"column.xpath.personal_contact"="/PayLoad/Members/Member/Demographics/PersonContactInformation",
"column.xpath.Enrollment"="/PayLoad/Members/Member/Enrollment"
) STORED AS
INPUTFORMAT 'com.ibm.spss.hive.serde2.xml.XmlInputFormat'
OUTPUTFORMAT 'org.apache.hadoop.hive.ql.io.IgnoreKeyTextOutputFormat'
TBLPROPERTIES
(
"xmlinput.start"="<PayLoad",
"xmlinput.end"="</PayLoad>"
)



the sample xml is attached in the email.

Problem:- When I am executing the below query
select * from h_xml limit 1 ;

the result shows me member_id of multiple nodes in the same row

hive> select * from h_xml  ;
OK
<string>02573767-0502573768-01</string> {"PersonIdentity":"<string><SSN>

in fact they are two different memberId and should show up in 2 rows.
  <PayLoad>
    <Members>
      <Member>
        <MemberID>02573767-05</MemberID>
        <ClientCode>AHPPOSI</ClientCode>
        <Demographics>
          <PersonIdentity>
            <SSN>XXXXXX</SSN>
            <Firstname>YYYY</Firstname>
            <LastName>ZZZZ</LastName>
            <DateOfBirth>9999-01-01</DateOfBirth>
            <Gender>F</Gender>
          </PersonIdentity>
          <PersonContactInformation>
            <Address usage="Primary" addressID="1">
              <AddressLineOne>ayz street</AddressLineOne>
              <City>OOOOO</City>
              <StateorProvince>OO</StateorProvince>
              <PostalCode>00000</PostalCode>
              <Country>United States</Country>
              <isPreferred>true</isPreferred>
            </Address>
            <Telephone usage="Primary" associatedAddress="1">
              <AreaCode>000</AreaCode>
              <PhoneNumber>00000000</PhoneNumber>
              <isPreferred>true</isPreferred>
            </Telephone>
          </PersonContactInformation>
        </Demographics>
        <Enrollment>
          <GroupID>9898989</GroupID>
          <GroupName>PPPPPPPP</GroupName>
          <ResponsiblePartyRelationshipCode>DEPENDANT</ResponsiblePartyRelationshipCode>
          <IsPrimary>true</IsPrimary>
        </Enrollment>
        
      </Member>
      <Member>
        <MemberID>02573768-01</MemberID>
        <ClientCode>CODE</ClientCode>
        <Demographics>
          <PersonIdentity>
            <SSN>XXXXXXXX</SSN>
            <Firstname>TTTTT</Firstname>
            <LastName>PPPPP</LastName>
            <DateOfBirth>1999-01-01</DateOfBirth>
            <Gender>F</Gender>
          </PersonIdentity>
          <PersonContactInformation>
            <Address usage="Primary" addressID="1">
              <AddressLineOne>1111 street</AddressLineOne>
              <City>town</City>
              <StateorProvince>PP</StateorProvince>
              <PostalCode>00000</PostalCode>
              <Country>United States</Country>
              <isPreferred>true</isPreferred>
            </Address>
            <Telephone usage="Primary" associatedAddress="1">
              <AreaCode>333333</AreaCode>
              <PhoneNumber>989898</PhoneNumber>
              <isPreferred>true</isPreferred>
            </Telephone>
          </PersonContactInformation>
        </Demographics>
        <Enrollment>
          <GroupID>11111</GroupID>
          <GroupName>PPPPPPP</GroupName>
          <ResponsiblePartyRelationshipCode>SELF</ResponsiblePartyRelationshipCode>
          <IsPrimary>true</IsPrimary>
        </Enrollment>

      </Member>
    </Members>
  </PayLoad>

Reply via email to