I think that before doing the FLATTEN, you should be 100% sure that your cast worked properly. Can you first DESCRIBE B and then DUMP B right away? Or probably it just can't be cast in this way. Honestly I don't know exactly how it works, but here: http://pig.apache.org/docs/r0.10.0/basic.html#cast I see that casting from a map to a bag should produce an error. Hope that helps.
On Wed, Apr 17, 2013 at 9:38 PM, Jerry Lam <[email protected]> wrote: > Hi Rusian: > > Thanks for your help. I really appreciate it. It really puzzled me. > > I did a "describe B", the output is "B: {b: bytearray}". > > I then tried to cast it as suggested, I got: > B = foreach A generate document#'b' as b:{}; > describe B; > B: {b: {()}} > > Then I proceed with: > C = foreach B generate flatten(b); > > I got: > 2013-04-17 13:38:04,601 [Thread-16] WARN > org.apache.hadoop.mapred.LocalJobRunner - job_local_0002 > java.lang.Exception: java.lang.ClassCastException: > org.apache.pig.data.DataByteArray cannot be cast to > org.apache.pig.data.DataBag > at org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:400) > Caused by: java.lang.ClassCastException: org.apache.pig.data.DataByteArray > cannot be cast to org.apache.pig.data.DataBag > at > > org.apache.pig.backend.hadoop.executionengine.physicalLayer.expressionOperators.POProject.processInputBag(POProject.java:586) > at > > org.apache.pig.backend.hadoop.executionengine.physicalLayer.expressionOperators.POProject.getNext(POProject.java:250) > at > > org.apache.pig.backend.hadoop.executionengine.physicalLayer.PhysicalOperator.getNext(PhysicalOperator.java:334) > at > > org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POForEach.processPlan(POForEach.java:372) > at > > org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POForEach.getNext(POForEach.java:297) > at > > org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigGenericMapBase.runPipeline(PigGenericMapBase.java:283) > at > > org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigGenericMapBase.map(PigGenericMapBase.java:278) > at > > org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigGenericMapBase.map(PigGenericMapBase.java:64) > at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:144) > at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:725) > at org.apache.hadoop.mapred.MapTask.run(MapTask.java:332) > at > > org.apache.hadoop.mapred.LocalJobRunner$Job$MapTaskRunnable.run(LocalJobRunner.java:232) > at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:439) > at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303) > at java.util.concurrent.FutureTask.run(FutureTask.java:138) > at > > java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:895) > at > > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:918) > at java.lang.Thread.run(Thread.java:680) > > Best Regards, > > Jerry > > > On Wed, Apr 17, 2013 at 1:24 PM, Ruslan Al-Fakikh <[email protected] > >wrote: > > > Hey, and as for converting a map of tuples, probably i got you wrong. If > > you can get to every value manually withing FOREACH then I see no problem > > in doing so. > > > > > > On Wed, Apr 17, 2013 at 9:22 PM, Ruslan Al-Fakikh <[email protected] > > >wrote: > > > > > I am not sure whether you can convert a map to a tuple. > > > But I am curious about one thing: > > > your are trying to use 'b' as a Bag, right? Because FLATTEN needs it to > > be > > > a Bag I guess: > > > http://pig.apache.org/docs/r0.10.0/basic.html#flatten > > > But it seems that Pig thinks that b is a byte array: > > > java.lang.ClassCastException: org.apache.pig.data.DataByteArray cannot > be > > > cast to org.apache.pig.data.DataBag > > > Can you do this?: > > > DESCRIBE B > > > > > > I suppose it can look like a Bag in the output of DUMP, but I think Pig > > > doesn't know it is a Bag, maybe you'll need some kind of explicit cast? > > > > > > > > > On Wed, Apr 17, 2013 at 9:11 PM, Jerry Lam <[email protected]> > wrote: > > > > > >> Hi Rusian, > > >> > > >> I tried to debug each step already but no luck. > > >> I did a dump (dump B;) after B = foreach A generate document#'b' as b; > > >> I got {([c#11,d#22]),([c#33,d#44])} > > >> but it fails when I did C = foreach B generate flatten(b); > > >> > > >> I don't have controls over the input. It is passed as Map of Maps. I > > guess > > >> it makes lookup easier using a map with keys. > > >> > > >> Can I convert map to tuple? > > >> > > >> Best Regards, > > >> > > >> Jerry > > >> > > >> > > >> > > >> On Wed, Apr 17, 2013 at 11:57 AM, Ruslan Al-Fakikh < > > [email protected] > > >> >wrote: > > >> > > >> > Hi Jerry, > > >> > > > >> > I would recommend to debug the issue step by step. Just after this > > line: > > >> > A = load 'data.txt' as document:[]; > > >> > and then right after that: > > >> > DESCRIBE A; > > >> > DUMP A; > > >> > and so on... > > >> > > > >> > To be honest I haven't used maps that much. Just curious, why did > you > > >> > choose to use them? You can also use regular tuples for storing the > > >> > relations. Also you can store the tuples with a schema file. > > >> > > > >> > Ruslan > > >> > > > >> > > > >> > On Wed, Apr 17, 2013 at 5:28 AM, Jerry Lam <[email protected]> > > >> wrote: > > >> > > > >> > > Hi pig users, > > >> > > > > >> > > I tried to load data using PigStorage that was previously stored > > using > > >> > > PigStorage but it failed. > > >> > > > > >> > > Each line looks like this in the data file that is generated by > > >> > PigStorage: > > >> > > [a#hello,b#{([c#11,d#22]),([c#33,d#44])}] > > >> > > > > >> > > I did the following: > > >> > > A = load 'data.txt' as document:[]; > > >> > > B = foreach A generate document#'b' as b; > > >> > > C = foreach B generate flatten(b); > > >> > > dump C; > > >> > > > > >> > > I expect to see the following output: > > >> > > ([c#11,d#22]) > > >> > > ([c#33,d#44]) > > >> > > > > >> > > Instead, I got: > > >> > > java.lang.ClassCastException: org.apache.pig.data.DataByteArray > > >> cannot be > > >> > > cast to org.apache.pig.data.DataBag > > >> > > > > >> > > Anyone encounters this problem before? How can I read the data > back? > > >> > > > > >> > > Thanks, > > >> > > > > >> > > Jerry > > >> > > > > >> > > > >> > > > > > > > > >
