Can you do describe on all the aliases that are the result of a join and post those?
On Tuesday, May 5, 2015, Tayler Lawrence Jones <[email protected]> wrote: > The below code works as expected: > > a = load 'data_a' using PigStorage('\t') as (a1, a2, a3); > b = load 'data_b' using PigStorage('\t') as (b1, b2, b3; > a_b = join a by a1, b by b1; --inner join > > When I inspect the fields, they are populated correctly. > > However, once I add a projection into the mix, it doesn't work. > > a = load 'data_a' using PigStorage('\t') as (a1, a2, a3); > b = load 'data_b' using PigStorage('\t') as (b1, b2, b3; > a_b = join a by a1, b by b1; --inner join > ab = foreach a_b generate a1 as a1, a2 as a2, b2 as b2; > In ab, all cells in the fields from b are NULL. > > The same thing happens if I do this: > > a = load 'data_a' using PigStorage('\t') as (a1, a2, a3); > a2 = foreach a generate a1, a2; > b = load 'data_b' using PigStorage('\t') as (b1, b2, b3; > b2 = foreach b generate b1, b2; > ab = join a2 by a1, b2 by b1; > > I use the following workaround, but hate being bogged down by the > store/load: > > a = load 'data_a' using PigStorage('\t') as (a1, a2, a3); > b = load 'data_b' using PigStorage('\t') as (b1, b2, b3; > a_b = join a by a1, b by b1; --inner join > store a_b into 'hdfs:///a_b_temp' using PigStorage('\t','-schema'); > a_b2 = load 'hdfs:///a_b_temp' using PigStorage('\t'); > ab = foreach a_b2 generate a1 as a1, a2 as a2, b2 as b2; > > And the fields in ab do not become NULL. However, if I then group and > perform aggregations, I typically get the error: > > ERROR org.apache.pig.tools.pigstats.SimplePigStats - ERROR: > org.apache.pig.data.DataByteArray cannot be cast to java.lang.Long > > However, this error goes away if I skip the last projection and keep the > 'relation::' parts of the field names. > > I am new to Pig - are there any known bugs/issues that could be causing > this? Am I coding something wrong? I have observed it happening several > times with different data sets. > > I am using pig 0.12 on Amazon AWS EMR. > > Thanks for any help! >
