Hi Prashant: I read about the map data type in the book "Programming Pig", it says: "... By default there is no requirement that all values in a map must be of the same type. It is legitimate to have a map with two keys name and age, where the value for name is a chararray and the value for age is an int. Beginning in Pig 0.9, a map can declare its values to all be of the same type... "
I agree that all values in the map can be of the same type but this is not required in pig. Best Regards, Jerry On Thu, Apr 18, 2013 at 10:37 AM, Jerry Lam <[email protected]> wrote: > Hi Rusian: > > I used PigStorage to store the data that is originally using Pig data > type. It is strange (or a bug in Pig) that I cannot read the data using > PigStorage that have been stored using PigStorage, isn't it? > > Best Regards, > > Jerry > > > > On Wed, Apr 17, 2013 at 10:52 PM, Ruslan Al-Fakikh > <[email protected]>wrote: > >> The output: >> ({ ([c#11,d#22]),([c#33,d#44]) }) >> () >> looks weird. >> >> Jerry, maybe the problem is in using PigStorage. As its javadoc says: >> >> A load function that parses a line of input into fields using a character >> delimiter >> >> So I guess this is just for simple csv lines. >> But you are trying to load a complicated Map structure as it was formatted >> by previous storing. >> Probably you'll need to write your own Loader for this. Another hint: >> using >> the -schema paramenter to PigStorage, but I am not sure it can help:( >> >> Ruslan >> >> >> On Wed, Apr 17, 2013 at 11:48 PM, Jerry Lam <[email protected]> wrote: >> >> > Hi Rusian: >> > >> > I did a describe B followed by a dump B, the output is: >> > B: {b: {()}} >> > >> > ({ ([c#11,d#22]),([c#33,d#44]) }) >> > () >> > >> > but when I executed >> > >> > C = foreach B generate flatten(b); >> > >> > dump C; >> > >> > I got the exception again... >> > >> > 2013-04-17 15:47:39,933 [Thread-26] WARN >> > org.apache.hadoop.mapred.LocalJobRunner - job_local_0002 >> > java.lang.Exception: java.lang.ClassCastException: >> > org.apache.pig.data.DataByteArray cannot be cast to >> > org.apache.pig.data.DataBag >> > at >> org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:400) >> > Caused by: java.lang.ClassCastException: >> org.apache.pig.data.DataByteArray >> > cannot be cast to org.apache.pig.data.DataBag >> > at >> > >> > >> org.apache.pig.backend.hadoop.executionengine.physicalLayer.expressionOperators.POProject.processInputBag(POProject.java:586) >> > at >> > >> > >> org.apache.pig.backend.hadoop.executionengine.physicalLayer.expressionOperators.POProject.getNext(POProject.java:250) >> > at >> > >> > >> org.apache.pig.backend.hadoop.executionengine.physicalLayer.PhysicalOperator.getNext(PhysicalOperator.java:334) >> > at >> > >> > >> org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POForEach.processPlan(POForEach.java:372) >> > at >> > >> > >> org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POForEach.getNext(POForEach.java:297) >> > at >> > >> > >> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigGenericMapBase.runPipeline(PigGenericMapBase.java:283) >> > at >> > >> > >> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigGenericMapBase.map(PigGenericMapBase.java:278) >> > at >> > >> > >> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigGenericMapBase.map(PigGenericMapBase.java:64) >> > at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:144) >> > at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:725) >> > at org.apache.hadoop.mapred.MapTask.run(MapTask.java:332) >> > at >> > >> > >> org.apache.hadoop.mapred.LocalJobRunner$Job$MapTaskRunnable.run(LocalJobRunner.java:232) >> > at >> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:439) >> > at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303) >> > at java.util.concurrent.FutureTask.run(FutureTask.java:138) >> > at >> > >> > >> java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:895) >> > at >> > >> > >> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:918) >> > at java.lang.Thread.run(Thread.java:680) >> > >> > >> > Best Regards, >> > >> > Jerry >> > >> > >> > On Wed, Apr 17, 2013 at 3:26 PM, Ruslan Al-Fakikh <[email protected] >> > >wrote: >> > >> > > I think that before doing the FLATTEN, you should be 100% sure that >> your >> > > cast worked properly. Can you first DESCRIBE B and then DUMP B right >> > away? >> > > Or probably it just can't be cast in this way. Honestly I don't know >> > > exactly how it works, but here: >> > > http://pig.apache.org/docs/r0.10.0/basic.html#cast >> > > I see that casting from a map to a bag should produce an error. >> > > Hope that helps. >> > > >> > > >> > > On Wed, Apr 17, 2013 at 9:38 PM, Jerry Lam <[email protected]> >> wrote: >> > > >> > > > Hi Rusian: >> > > > >> > > > Thanks for your help. I really appreciate it. It really puzzled me. >> > > > >> > > > I did a "describe B", the output is "B: {b: bytearray}". >> > > > >> > > > I then tried to cast it as suggested, I got: >> > > > B = foreach A generate document#'b' as b:{}; >> > > > describe B; >> > > > B: {b: {()}} >> > > > >> > > > Then I proceed with: >> > > > C = foreach B generate flatten(b); >> > > > >> > > > I got: >> > > > 2013-04-17 13:38:04,601 [Thread-16] WARN >> > > > org.apache.hadoop.mapred.LocalJobRunner - job_local_0002 >> > > > java.lang.Exception: java.lang.ClassCastException: >> > > > org.apache.pig.data.DataByteArray cannot be cast to >> > > > org.apache.pig.data.DataBag >> > > > at >> > > >> org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:400) >> > > > Caused by: java.lang.ClassCastException: >> > > org.apache.pig.data.DataByteArray >> > > > cannot be cast to org.apache.pig.data.DataBag >> > > > at >> > > > >> > > > >> > > >> > >> org.apache.pig.backend.hadoop.executionengine.physicalLayer.expressionOperators.POProject.processInputBag(POProject.java:586) >> > > > at >> > > > >> > > > >> > > >> > >> org.apache.pig.backend.hadoop.executionengine.physicalLayer.expressionOperators.POProject.getNext(POProject.java:250) >> > > > at >> > > > >> > > > >> > > >> > >> org.apache.pig.backend.hadoop.executionengine.physicalLayer.PhysicalOperator.getNext(PhysicalOperator.java:334) >> > > > at >> > > > >> > > > >> > > >> > >> org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POForEach.processPlan(POForEach.java:372) >> > > > at >> > > > >> > > > >> > > >> > >> org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POForEach.getNext(POForEach.java:297) >> > > > at >> > > > >> > > > >> > > >> > >> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigGenericMapBase.runPipeline(PigGenericMapBase.java:283) >> > > > at >> > > > >> > > > >> > > >> > >> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigGenericMapBase.map(PigGenericMapBase.java:278) >> > > > at >> > > > >> > > > >> > > >> > >> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigGenericMapBase.map(PigGenericMapBase.java:64) >> > > > at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:144) >> > > > at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:725) >> > > > at org.apache.hadoop.mapred.MapTask.run(MapTask.java:332) >> > > > at >> > > > >> > > > >> > > >> > >> org.apache.hadoop.mapred.LocalJobRunner$Job$MapTaskRunnable.run(LocalJobRunner.java:232) >> > > > at >> > > >> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:439) >> > > > at >> java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303) >> > > > at java.util.concurrent.FutureTask.run(FutureTask.java:138) >> > > > at >> > > > >> > > > >> > > >> > >> java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:895) >> > > > at >> > > > >> > > > >> > > >> > >> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:918) >> > > > at java.lang.Thread.run(Thread.java:680) >> > > > >> > > > Best Regards, >> > > > >> > > > Jerry >> > > > >> > > > >> > > > On Wed, Apr 17, 2013 at 1:24 PM, Ruslan Al-Fakikh < >> > [email protected] >> > > > >wrote: >> > > > >> > > > > Hey, and as for converting a map of tuples, probably i got you >> wrong. >> > > If >> > > > > you can get to every value manually withing FOREACH then I see no >> > > problem >> > > > > in doing so. >> > > > > >> > > > > >> > > > > On Wed, Apr 17, 2013 at 9:22 PM, Ruslan Al-Fakikh < >> > > [email protected] >> > > > > >wrote: >> > > > > >> > > > > > I am not sure whether you can convert a map to a tuple. >> > > > > > But I am curious about one thing: >> > > > > > your are trying to use 'b' as a Bag, right? Because FLATTEN >> needs >> > it >> > > to >> > > > > be >> > > > > > a Bag I guess: >> > > > > > http://pig.apache.org/docs/r0.10.0/basic.html#flatten >> > > > > > But it seems that Pig thinks that b is a byte array: >> > > > > > java.lang.ClassCastException: org.apache.pig.data.DataByteArray >> > > cannot >> > > > be >> > > > > > cast to org.apache.pig.data.DataBag >> > > > > > Can you do this?: >> > > > > > DESCRIBE B >> > > > > > >> > > > > > I suppose it can look like a Bag in the output of DUMP, but I >> think >> > > Pig >> > > > > > doesn't know it is a Bag, maybe you'll need some kind of >> explicit >> > > cast? >> > > > > > >> > > > > > >> > > > > > On Wed, Apr 17, 2013 at 9:11 PM, Jerry Lam < >> [email protected]> >> > > > wrote: >> > > > > > >> > > > > >> Hi Rusian, >> > > > > >> >> > > > > >> I tried to debug each step already but no luck. >> > > > > >> I did a dump (dump B;) after B = foreach A generate >> document#'b' >> > as >> > > b; >> > > > > >> I got {([c#11,d#22]),([c#33,d#44])} >> > > > > >> but it fails when I did C = foreach B generate flatten(b); >> > > > > >> >> > > > > >> I don't have controls over the input. It is passed as Map of >> > Maps. I >> > > > > guess >> > > > > >> it makes lookup easier using a map with keys. >> > > > > >> >> > > > > >> Can I convert map to tuple? >> > > > > >> >> > > > > >> Best Regards, >> > > > > >> >> > > > > >> Jerry >> > > > > >> >> > > > > >> >> > > > > >> >> > > > > >> On Wed, Apr 17, 2013 at 11:57 AM, Ruslan Al-Fakikh < >> > > > > [email protected] >> > > > > >> >wrote: >> > > > > >> >> > > > > >> > Hi Jerry, >> > > > > >> > >> > > > > >> > I would recommend to debug the issue step by step. Just after >> > this >> > > > > line: >> > > > > >> > A = load 'data.txt' as document:[]; >> > > > > >> > and then right after that: >> > > > > >> > DESCRIBE A; >> > > > > >> > DUMP A; >> > > > > >> > and so on... >> > > > > >> > >> > > > > >> > To be honest I haven't used maps that much. Just curious, why >> > did >> > > > you >> > > > > >> > choose to use them? You can also use regular tuples for >> storing >> > > the >> > > > > >> > relations. Also you can store the tuples with a schema file. >> > > > > >> > >> > > > > >> > Ruslan >> > > > > >> > >> > > > > >> > >> > > > > >> > On Wed, Apr 17, 2013 at 5:28 AM, Jerry Lam < >> > [email protected]> >> > > > > >> wrote: >> > > > > >> > >> > > > > >> > > Hi pig users, >> > > > > >> > > >> > > > > >> > > I tried to load data using PigStorage that was previously >> > stored >> > > > > using >> > > > > >> > > PigStorage but it failed. >> > > > > >> > > >> > > > > >> > > Each line looks like this in the data file that is >> generated >> > by >> > > > > >> > PigStorage: >> > > > > >> > > [a#hello,b#{([c#11,d#22]),([c#33,d#44])}] >> > > > > >> > > >> > > > > >> > > I did the following: >> > > > > >> > > A = load 'data.txt' as document:[]; >> > > > > >> > > B = foreach A generate document#'b' as b; >> > > > > >> > > C = foreach B generate flatten(b); >> > > > > >> > > dump C; >> > > > > >> > > >> > > > > >> > > I expect to see the following output: >> > > > > >> > > ([c#11,d#22]) >> > > > > >> > > ([c#33,d#44]) >> > > > > >> > > >> > > > > >> > > Instead, I got: >> > > > > >> > > java.lang.ClassCastException: >> > org.apache.pig.data.DataByteArray >> > > > > >> cannot be >> > > > > >> > > cast to org.apache.pig.data.DataBag >> > > > > >> > > >> > > > > >> > > Anyone encounters this problem before? How can I read the >> data >> > > > back? >> > > > > >> > > >> > > > > >> > > Thanks, >> > > > > >> > > >> > > > > >> > > Jerry >> > > > > >> > > >> > > > > >> > >> > > > > >> >> > > > > > >> > > > > > >> > > > > >> > > > >> > > >> > >> > >
