I tried changing the hive column datatype from ARRAY to STRUCT for
cust_address, then i imported the table in pig.

Now I am able to separate the fields, as below

grunt> Z = load 'cust_info' using org.apache.hcatalog.pig.HCatLoader();
grunt> describe Z;
Z: {cust_id: int,cust_name: chararray,cust_address: (house_no: int,street:
chararray,city: chararray)}


grunt> Y = foreach Z generate cust_address.house_no as
house_no,cust_address.street as street,UPPER(cust_address.city) as city;
grunt> describe Y;
Y: {house_no: int,street: chararray,city: chararray}

grunt> dump Y;
(2200,benjamin franklin,PHILADELPHIA)
(44,atlanta franklin,FLORIDA)


On Mon, Jun 2, 2014 at 1:09 PM, Rahul Channe <[email protected]> wrote:

> grunt> B = foreach A generate BagToTuple(cust_address);
>
> grunt> describe B;
> B: {org.apache.pig.builtin.bagtotuple_cust_address_24: (innerfield:
> chararray)}
>
> grunt> dump B;
> ((2200,benjamin franklin,philadelphia))
> ((44,atlanta franklin,florida))
>
>
>
>
> On Mon, Jun 2, 2014 at 12:59 PM, Pradeep Gollakota <[email protected]>
> wrote:
>
>> If you're using the built-in BagToTuple UDF, then you probably don't need
>> the FLATTEN operator.
>>
>> I suspect that your output looks as follows:
>>
>> 2200
>> benjamin avenue
>> philadelphia
>> ...
>>
>> Can you confirm that this is what you're seeing?
>>
>>
>> On Mon, Jun 2, 2014 at 9:52 AM, Rahul Channe <[email protected]>
>> wrote:
>>
>> > Thank You Pradeep, it worked to a certain extend but having following
>> > difficulty in separating fields as $0,$1 for the customer_address.
>> >
>> >
>> > Example -
>> >
>> > grunt> describe A;
>> > A: {cust_id: int,cust_name: chararray,cust_address: {innertuple:
>> > (innerfield: chararray)},cust_email: chararray}
>> >
>> > grunt> dump A;
>> >
>> > (123,phil abc,{(2200),(benjamin avenue),(philadelphia)},[email protected])
>> > (124,diego arty,{(44),(atlanta franklin),(florida)},[email protected])
>> >
>> > grunt> B = foreach A generate FLATTEN(BagToTuple(cust_address));
>> > grunt> dump B;
>> > (2200,benjamin franklin,philadelphia)
>> > (44,atlanta franklin,florida)
>> >
>> > grunt> describe B;
>> > B: {org.apache.pig.builtin.bagtotuple_cust_address_34::innerfield:
>> > chararray}
>> >
>> >
>> >
>> > I am not able to seperate the fields in B as $0,$1 and $3 ,tried using
>> > STRSPLIT but didnt work.
>> >
>> >
>> >
>> > On Mon, Jun 2, 2014 at 11:50 AM, Pradeep Gollakota <
>> [email protected]>
>> > wrote:
>> >
>> > > There was a similar question as this on StackOverflow a while back.
>> The
>> > > suggestion was to write a custom BagToTuple UDF.
>> > >
>> > >
>> > >
>> >
>> http://stackoverflow.com/questions/18544602/how-to-flatten-a-group-into-a-single-tuple-in-pig
>> > >
>> > >
>> > > On Mon, Jun 2, 2014 at 8:46 AM, Pradeep Gollakota <
>> [email protected]>
>> > > wrote:
>> > >
>> > > > Disregard last email.
>> > > >
>> > > > Sorry... didn't fully understand the question.
>> > > >
>> > > >
>> > > > On Mon, Jun 2, 2014 at 8:44 AM, Pradeep Gollakota <
>> > [email protected]>
>> > > > wrote:
>> > > >
>> > > >> FOREACH A GENERATE cust_id, cust_name, FLATTEN(cust_address),
>> > > cust_email;
>> > > >>
>> > > >> ​
>> > > >>
>> > > >>
>> > > >> On Sun, Jun 1, 2014 at 5:54 PM, Rahul Channe <
>> [email protected]>
>> > > >> wrote:
>> > > >>
>> > > >>> Hi All,
>> > > >>>
>> > > >>> I have imported hive table into pig having a complex data type
>> > > >>> (ARRAY<String>). The alias in pig looks as below
>> > > >>>
>> > > >>> grunt> describe A;
>> > > >>> A: {cust_id: int,cust_name: chararray,cust_address: {innertuple:
>> > > >>> (innerfield: chararray)},cust_email: chararray}
>> > > >>>
>> > > >>> grunt> dump A;
>> > > >>>
>> > > >>> (123,phil abc,{(2200),(benjamin avenue),(philadelphia)},
>> > [email protected]
>> > > )
>> > > >>> (124,diego arty,{(44),(atlanta franklin),(florida)},
>> [email protected])
>> > > >>>
>> > > >>> The cust_address is the ARRAY field from hive. I want to FLATTEN
>> the
>> > > >>> cust_address into different fields.
>> > > >>>
>> > > >>>
>> > > >>> Expected output
>> > > >>> (2200,benjamin avenue,philadelphia)
>> > > >>> (44,atlanta franklin,florida)
>> > > >>>
>> > > >>> please help
>> > > >>>
>> > > >>> Regards,
>> > > >>> Rahul
>> > > >>>
>> > > >>
>> > > >>
>> > > >
>> > >
>> >
>>
>
>

Reply via email to