Hi Dmitry,

Thank you, it worked.
Any reason why we have to specify the lowest possible node to make it work.

Regards,
Rahul

On Monday, August 25, 2014, Dmitry Vasilenko <dvasi...@gmail.com> wrote:

> I would try something like this:
>
> CREATE TABLE h_xml(member_id STRING , personal_identity
> map<string,string>,enrollment map<string,string>)
>      ROW FORMAT SERDE 'com.ibm.spss.hive.serde2.xml.XmlSerDe'
> WITH SERDEPROPERTIES (
> "column.xpath.member_id"="/Member/MemberID/text()",
> "column.xpath.personal_identity"="/Member/Demographics/PersonIdentity/*",
> "column.xpath.enrollment"="/Member/Enrollment/*"
> )
> STORED AS
>      INPUTFORMAT 'com.ibm.spss.hive.serde2.xml.XmlInputFormat'
>      OUTPUTFORMAT 'org.apache.hadoop.hive.ql.io.IgnoreKeyTextOutputFormat'
>      TBLPROPERTIES (
>      "xmlinput.start"="<Member>",
>      "xmlinput.end"="</Member>"
>      );
>
> load data local inpath '/home/dvasilen/Downloads/few_sample_h.xml'
> OVERWRITE into table h_xml;
>
> select * from h_xml;
>
> Here is the output:
>
>     > select * from h_xml;
> OK
> 02573767-05
>  
> {"LastName":"ZZZZ","Gender":"F","DateOfBirth":"9999-01-01","Firstname":"YYYY","SSN":"XXXXXX"}
>  
> {"ResponsiblePartyRelationshipCode":"DEPENDANT","IsPrimary":"true","GroupID":"9898989","GroupName":"PPPPPPPP"}
> 02573768-01
>  
> {"LastName":"PPPPP","Gender":"F","DateOfBirth":"1999-01-01","Firstname":"TTTTT","SSN":"XXXXXXXX"}
>
>  
> {"ResponsiblePartyRelationshipCode":"SELF","IsPrimary":"true","GroupID":"11111","GroupName":"PPPPPPP"}
> Time taken: 0.067 seconds, Fetched: 2 row(s)
> hive>
>
>
>
>
>
> On 8/25/14, Rahul Channe <drah...@googlemail.com <javascript:;>> wrote:
> > Hi All,
> >
> > I am using hivexmlserde-1.0.5.1.jar to load xml data into hive table.
> > Following is the table DDL
> >
> > create external table h_xml (member_id string,personal_identity
> > map<string,string>,personal_contact map<string,string>,Enrollment
> > map<string,string> )
> > row format serde 'com.ibm.spss.hive.serde2.xml.XmlSerDe'
> > with SERDEPROPERTIES (
> > "column.xpath.member_id"="/PayLoad/Members/Member/MemberID/text()",
> >
> "column.xpath.personal_identity"="/PayLoad/Members/Member/Demographics/PersonIdentity",
> >
> "column.xpath.personal_contact"="/PayLoad/Members/Member/Demographics/PersonContactInformation",
> > "column.xpath.Enrollment"="/PayLoad/Members/Member/Enrollment"
> > ) STORED AS
> > INPUTFORMAT 'com.ibm.spss.hive.serde2.xml.XmlInputFormat'
> > OUTPUTFORMAT 'org.apache.hadoop.hive.ql.io.IgnoreKeyTextOutputFormat'
> > TBLPROPERTIES
> > (
> > "xmlinput.start"="<PayLoad",
> > "xmlinput.end"="</PayLoad>"
> > )
> >
> >
> >
> > the sample xml is attached in the email.
> >
> > Problem:- When I am executing the below query
> > select * from h_xml limit 1 ;
> >
> > the result shows me member_id of multiple nodes in the same row
> >
> > hive> select * from h_xml  ;
> > OK
> > <string>02573767-0502573768-01</string> {"PersonIdentity":"<string><SSN>
> >
> > in fact they are two different memberId and should show up in 2 rows.
> >
>

Reply via email to