Hi Dmitry, Thank you, it worked. Any reason why we have to specify the lowest possible node to make it work.
Regards, Rahul On Monday, August 25, 2014, Dmitry Vasilenko <dvasi...@gmail.com> wrote: > I would try something like this: > > CREATE TABLE h_xml(member_id STRING , personal_identity > map<string,string>,enrollment map<string,string>) > ROW FORMAT SERDE 'com.ibm.spss.hive.serde2.xml.XmlSerDe' > WITH SERDEPROPERTIES ( > "column.xpath.member_id"="/Member/MemberID/text()", > "column.xpath.personal_identity"="/Member/Demographics/PersonIdentity/*", > "column.xpath.enrollment"="/Member/Enrollment/*" > ) > STORED AS > INPUTFORMAT 'com.ibm.spss.hive.serde2.xml.XmlInputFormat' > OUTPUTFORMAT 'org.apache.hadoop.hive.ql.io.IgnoreKeyTextOutputFormat' > TBLPROPERTIES ( > "xmlinput.start"="<Member>", > "xmlinput.end"="</Member>" > ); > > load data local inpath '/home/dvasilen/Downloads/few_sample_h.xml' > OVERWRITE into table h_xml; > > select * from h_xml; > > Here is the output: > > > select * from h_xml; > OK > 02573767-05 > > {"LastName":"ZZZZ","Gender":"F","DateOfBirth":"9999-01-01","Firstname":"YYYY","SSN":"XXXXXX"} > > {"ResponsiblePartyRelationshipCode":"DEPENDANT","IsPrimary":"true","GroupID":"9898989","GroupName":"PPPPPPPP"} > 02573768-01 > > {"LastName":"PPPPP","Gender":"F","DateOfBirth":"1999-01-01","Firstname":"TTTTT","SSN":"XXXXXXXX"} > > > {"ResponsiblePartyRelationshipCode":"SELF","IsPrimary":"true","GroupID":"11111","GroupName":"PPPPPPP"} > Time taken: 0.067 seconds, Fetched: 2 row(s) > hive> > > > > > > On 8/25/14, Rahul Channe <drah...@googlemail.com <javascript:;>> wrote: > > Hi All, > > > > I am using hivexmlserde-1.0.5.1.jar to load xml data into hive table. > > Following is the table DDL > > > > create external table h_xml (member_id string,personal_identity > > map<string,string>,personal_contact map<string,string>,Enrollment > > map<string,string> ) > > row format serde 'com.ibm.spss.hive.serde2.xml.XmlSerDe' > > with SERDEPROPERTIES ( > > "column.xpath.member_id"="/PayLoad/Members/Member/MemberID/text()", > > > "column.xpath.personal_identity"="/PayLoad/Members/Member/Demographics/PersonIdentity", > > > "column.xpath.personal_contact"="/PayLoad/Members/Member/Demographics/PersonContactInformation", > > "column.xpath.Enrollment"="/PayLoad/Members/Member/Enrollment" > > ) STORED AS > > INPUTFORMAT 'com.ibm.spss.hive.serde2.xml.XmlInputFormat' > > OUTPUTFORMAT 'org.apache.hadoop.hive.ql.io.IgnoreKeyTextOutputFormat' > > TBLPROPERTIES > > ( > > "xmlinput.start"="<PayLoad", > > "xmlinput.end"="</PayLoad>" > > ) > > > > > > > > the sample xml is attached in the email. > > > > Problem:- When I am executing the below query > > select * from h_xml limit 1 ; > > > > the result shows me member_id of multiple nodes in the same row > > > > hive> select * from h_xml ; > > OK > > <string>02573767-0502573768-01</string> {"PersonIdentity":"<string><SSN> > > > > in fact they are two different memberId and should show up in 2 rows. > > >