RE: Converting xml to csv

william.dowling Fri, 13 Sep 2013 11:46:38 -0700

Ajay's suggestion will work for elements like <employee_id> in your example, 
that occur all on one line. If you want to get the whole <employee> element, 
and that spans more than one line, you will not be able to get it with matching 
(.*) since that will not match a newline character.


You can remove newline characters using
B = foreach A generate REPLACE(x,'[\\n]','');


William F Dowling
Senior Technologist
Thomson Reuters


-----Original Message-----
From: ajay kumar [mailto:[email protected]] 
Sent: Friday, September 13, 2013 2:21 AM
To: [email protected]
Subject: Re: Converting xml to csv

try this ...

register /usr/lib/pig/piggybank.jar
A = load '/home/sudeep/Desktop/test1' using
org.apache.pig.piggybank.storage.XMLLoader('employee_id') as (x:chararray);
B = foreach A generate REGEX_EXTRACT(x,'<employee_id>(.*)</employee_id>',1);


On Fri, Sep 13, 2013 at 3:54 AM, jamal sasha <[email protected]> wrote:

> Hi,
>  I am trying to parse following json
>
>
>  <employee>
>     <employee_id>1234</employee_id>
>     <email>[email protected]</email>
>     <name>(first_name_1234,middle_initial_1234,last_name_1234)</name>
>
> <projects>{(project_1234_1),(project_1234_2),(project_1234_3)}</projects>
>     <skills>[programming:SQL,rdbms:Oracle]</skills>
>   </employee>
>
> And my script is
>
> a = LOAD 'sample.xml' USING
> org.apache.pig.piggybank.storage.XMLLoader('employee') as (x:chararray);
> B = foreach a generate REGEX_EXTRACT(x,'<employee>(.*)</employee>',1)
> dump B;
>  now B is empty tuple here?
> Not sure what am i missing?
>
>
>
>
> On Wed, Sep 11, 2013 at 11:35 PM, ajay kumar <[email protected]
> >wrote:
>
> > use org.apache.pig.piggybank.storage.XMLLoader  and then extract them
> using
> > regex_all
> >
> >
> > On Thu, Sep 12, 2013 at 11:18 AM, jamal sasha <[email protected]>
> > wrote:
> >
> > > Umm.. yess.. but how do i generalize it..
> > > so what I am looking for is.. just like we have json parser in say java
> > > If i give a valid json string.. I can parse it as and then i can access
> > it
> > > as a hashmap..
> > > But in xml loader.. i still have to specify regex rules??
> > >
> > > Actually, is it possible to just flatten the xml..
> > > so for example
> > > convert
> > > <aux>
> > > <foobar>1</foobar>
> > > <fushbar>foo</fushbar>
> > > </aux>
> > > to
> > > <aux><foobar>1</foobar><fushbar>foo</fushbar></aux>
> > > ???
> > >
> > >
> > >
> > >
> > > On Wed, Sep 11, 2013 at 10:32 PM, Jagat Singh <[email protected]>
> > > wrote:
> > >
> > > > Use piggybank xmlloader
> > > >  On 12/09/2013 10:14 AM, "jamal sasha" <[email protected]>
> wrote:
> > > >
> > > > > Hi,
> > > > >   So I have different xml data sources...For example:
> > > > >
> > > > > src1.txt
> > > > >
> > > > > <foo>
> > > > > <bar>1</bar>
> > > > > </foo>
> > > > > <foo>
> > > > > <bar>2</bar>
> > > > > </foo>
> > > > > .. and so on
> > > > >
> > > > >
> > > > > and another data
> > > > >
> > > > > src2.txt
> > > > >
> > > > > <aux>
> > > > > <foobar>1</foobar>
> > > > > <fushbar>foo</fushbar>
> > > > > </aux>
> > > > >
> > > > > ... and so on
> > > > >
> > > > >
> > > > > So basicaly different xml (valid formats)
> > > > >
> > > > > Rather than writing different pig scripts.. is there a way to
> write 1
> > > > > script and then convert all these xml data into csv?
> > > > > Thanks
> > > > >
> > > >
> > >
> >
> >
> >
> > --
> > *Thanks & Regards,*
> > *S. Ajay Kumar
> > +91-9966159106*
> >
>



-- 
*Thanks & Regards,*
*S. Ajay Kumar
+91-9966159106*

RE: Converting xml to csv

Reply via email to