Hi Jerry, Sorry I misled you in my suggestions a bit:) As for your last question: it was interesting for me to investigate the issue. Here is what I found: https://issues.apache.org/jira/browse/PIG-2216 https://issues.apache.org/jira/browse/PIG-2315 So here B = foreach A generate document#'b' as b:bag{};" due to the misleading Pig syntax/behaviour you are not casting, just renaming:(
Ruslan On Fri, Apr 19, 2013 at 2:57 AM, Jerry Lam <[email protected]> wrote: > Hi Prashant: > > Just trying to understand my mistake... > I thought "B = foreach A generate document#'b' as b:bag{};" will cast > bytearray to bag because of b:bag{}. If I understand correctly, this is not > what I thought. Am I correct? > > Best Regards, > > Jerry > > > > > On Thu, Apr 18, 2013 at 5:41 PM, Prashant Kommireddi <[email protected] > >wrote: > > > Hi Jerry, > > > > Like I mentioned in my earlier email "Map values by default are > bytearrays. > > If you need them to be any other type, you would need to define it > > explicitly." > > > > Difference in the 2 statements is one does a cast to "bag" and the other > is > > a bytearray (default). > > > > > > > > > > > > > > On Thu, Apr 18, 2013 at 2:14 PM, Jerry Lam <[email protected]> wrote: > > > > > Hi Prashant: > > > > > > IT WORKS! THANKS! > > > What is the difference between : > > > "B = foreach A generate (bag{})document#'b' as b; > > > and > > > B = foreach A generate document#'b' as b:bag{};" > > > ? > > > > > > The latter gives error: java.lang.ClassCastException: > > > org.apache.pig.data.DataByteArray cannot be cast to > > > org.apache.pig.data.DataBag > > > > > > Best Regards, > > > > > > Jerry > > > > > > > > > On Thu, Apr 18, 2013 at 12:34 PM, Prashant Kommireddi > > > <[email protected]>wrote: > > > > > > > Well, let me rephrase - the values all have to be the same type if > you > > > > choose to read all columns in a similar way. If you know in advance > its > > > > always the value associated with key 'b' that's a bag, why don't you > > cast > > > > that single value? > > > > > > > > B = foreach A generate (bag{})document#'b' as b; > > > > > > > > > > > > On Thu, Apr 18, 2013 at 7:43 AM, Jerry Lam <[email protected]> > > wrote: > > > > > > > > > Hi Prashant: > > > > > > > > > > I read about the map data type in the book "Programming Pig", it > > says: > > > > > "... By default there is no requirement that all values in a map > must > > > be > > > > of > > > > > the same type. It is legitimate to have a map with two keys name > and > > > age, > > > > > where the value for name is a chararray and the value for age is an > > > int. > > > > > Beginning in Pig 0.9, a map can declare its values to all be of the > > > same > > > > > type... " > > > > > > > > > > I agree that all values in the map can be of the same type but this > > is > > > > not > > > > > required in pig. > > > > > > > > > > Best Regards, > > > > > > > > > > Jerry > > > > > > > > > > > > > > > On Thu, Apr 18, 2013 at 10:37 AM, Jerry Lam <[email protected]> > > > > wrote: > > > > > > > > > > > Hi Rusian: > > > > > > > > > > > > I used PigStorage to store the data that is originally using Pig > > data > > > > > > type. It is strange (or a bug in Pig) that I cannot read the data > > > using > > > > > > PigStorage that have been stored using PigStorage, isn't it? > > > > > > > > > > > > Best Regards, > > > > > > > > > > > > Jerry > > > > > > > > > > > > > > > > > > > > > > > > On Wed, Apr 17, 2013 at 10:52 PM, Ruslan Al-Fakikh < > > > > [email protected] > > > > > >wrote: > > > > > > > > > > > >> The output: > > > > > >> ({ ([c#11,d#22]),([c#33,d#44]) }) > > > > > >> () > > > > > >> looks weird. > > > > > >> > > > > > >> Jerry, maybe the problem is in using PigStorage. As its javadoc > > > says: > > > > > >> > > > > > >> A load function that parses a line of input into fields using a > > > > > character > > > > > >> delimiter > > > > > >> > > > > > >> So I guess this is just for simple csv lines. > > > > > >> But you are trying to load a complicated Map structure as it was > > > > > formatted > > > > > >> by previous storing. > > > > > >> Probably you'll need to write your own Loader for this. Another > > > hint: > > > > > >> using > > > > > >> the -schema paramenter to PigStorage, but I am not sure it can > > > help:( > > > > > >> > > > > > >> Ruslan > > > > > >> > > > > > >> > > > > > >> On Wed, Apr 17, 2013 at 11:48 PM, Jerry Lam < > [email protected] > > > > > > > > wrote: > > > > > >> > > > > > >> > Hi Rusian: > > > > > >> > > > > > > >> > I did a describe B followed by a dump B, the output is: > > > > > >> > B: {b: {()}} > > > > > >> > > > > > > >> > ({ ([c#11,d#22]),([c#33,d#44]) }) > > > > > >> > () > > > > > >> > > > > > > >> > but when I executed > > > > > >> > > > > > > >> > C = foreach B generate flatten(b); > > > > > >> > > > > > > >> > dump C; > > > > > >> > > > > > > >> > I got the exception again... > > > > > >> > > > > > > >> > 2013-04-17 15:47:39,933 [Thread-26] WARN > > > > > >> > org.apache.hadoop.mapred.LocalJobRunner - job_local_0002 > > > > > >> > java.lang.Exception: java.lang.ClassCastException: > > > > > >> > org.apache.pig.data.DataByteArray cannot be cast to > > > > > >> > org.apache.pig.data.DataBag > > > > > >> > at > > > > > >> > > > > > > org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:400) > > > > > >> > Caused by: java.lang.ClassCastException: > > > > > >> org.apache.pig.data.DataByteArray > > > > > >> > cannot be cast to org.apache.pig.data.DataBag > > > > > >> > at > > > > > >> > > > > > > >> > > > > > > >> > > > > > > > > > > > > > > > org.apache.pig.backend.hadoop.executionengine.physicalLayer.expressionOperators.POProject.processInputBag(POProject.java:586) > > > > > >> > at > > > > > >> > > > > > > >> > > > > > > >> > > > > > > > > > > > > > > > org.apache.pig.backend.hadoop.executionengine.physicalLayer.expressionOperators.POProject.getNext(POProject.java:250) > > > > > >> > at > > > > > >> > > > > > > >> > > > > > > >> > > > > > > > > > > > > > > > org.apache.pig.backend.hadoop.executionengine.physicalLayer.PhysicalOperator.getNext(PhysicalOperator.java:334) > > > > > >> > at > > > > > >> > > > > > > >> > > > > > > >> > > > > > > > > > > > > > > > org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POForEach.processPlan(POForEach.java:372) > > > > > >> > at > > > > > >> > > > > > > >> > > > > > > >> > > > > > > > > > > > > > > > org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POForEach.getNext(POForEach.java:297) > > > > > >> > at > > > > > >> > > > > > > >> > > > > > > >> > > > > > > > > > > > > > > > org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigGenericMapBase.runPipeline(PigGenericMapBase.java:283) > > > > > >> > at > > > > > >> > > > > > > >> > > > > > > >> > > > > > > > > > > > > > > > org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigGenericMapBase.map(PigGenericMapBase.java:278) > > > > > >> > at > > > > > >> > > > > > > >> > > > > > > >> > > > > > > > > > > > > > > > org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigGenericMapBase.map(PigGenericMapBase.java:64) > > > > > >> > at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:144) > > > > > >> > at > > org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:725) > > > > > >> > at org.apache.hadoop.mapred.MapTask.run(MapTask.java:332) > > > > > >> > at > > > > > >> > > > > > > >> > > > > > > >> > > > > > > > > > > > > > > > org.apache.hadoop.mapred.LocalJobRunner$Job$MapTaskRunnable.run(LocalJobRunner.java:232) > > > > > >> > at > > > > > >> > > > > > java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:439) > > > > > >> > at > > > > java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303) > > > > > >> > at java.util.concurrent.FutureTask.run(FutureTask.java:138) > > > > > >> > at > > > > > >> > > > > > > >> > > > > > > >> > > > > > > > > > > > > > > > java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:895) > > > > > >> > at > > > > > >> > > > > > > >> > > > > > > >> > > > > > > > > > > > > > > > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:918) > > > > > >> > at java.lang.Thread.run(Thread.java:680) > > > > > >> > > > > > > >> > > > > > > >> > Best Regards, > > > > > >> > > > > > > >> > Jerry > > > > > >> > > > > > > >> > > > > > > >> > On Wed, Apr 17, 2013 at 3:26 PM, Ruslan Al-Fakikh < > > > > > [email protected] > > > > > >> > >wrote: > > > > > >> > > > > > > >> > > I think that before doing the FLATTEN, you should be 100% > sure > > > > that > > > > > >> your > > > > > >> > > cast worked properly. Can you first DESCRIBE B and then > DUMP B > > > > right > > > > > >> > away? > > > > > >> > > Or probably it just can't be cast in this way. Honestly I > > don't > > > > know > > > > > >> > > exactly how it works, but here: > > > > > >> > > http://pig.apache.org/docs/r0.10.0/basic.html#cast > > > > > >> > > I see that casting from a map to a bag should produce an > > error. > > > > > >> > > Hope that helps. > > > > > >> > > > > > > > >> > > > > > > > >> > > On Wed, Apr 17, 2013 at 9:38 PM, Jerry Lam < > > > [email protected]> > > > > > >> wrote: > > > > > >> > > > > > > > >> > > > Hi Rusian: > > > > > >> > > > > > > > > >> > > > Thanks for your help. I really appreciate it. It really > > > puzzled > > > > > me. > > > > > >> > > > > > > > > >> > > > I did a "describe B", the output is "B: {b: bytearray}". > > > > > >> > > > > > > > > >> > > > I then tried to cast it as suggested, I got: > > > > > >> > > > B = foreach A generate document#'b' as b:{}; > > > > > >> > > > describe B; > > > > > >> > > > B: {b: {()}} > > > > > >> > > > > > > > > >> > > > Then I proceed with: > > > > > >> > > > C = foreach B generate flatten(b); > > > > > >> > > > > > > > > >> > > > I got: > > > > > >> > > > 2013-04-17 13:38:04,601 [Thread-16] WARN > > > > > >> > > > org.apache.hadoop.mapred.LocalJobRunner - job_local_0002 > > > > > >> > > > java.lang.Exception: java.lang.ClassCastException: > > > > > >> > > > org.apache.pig.data.DataByteArray cannot be cast to > > > > > >> > > > org.apache.pig.data.DataBag > > > > > >> > > > at > > > > > >> > > > > > > > >> > > > > > > org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:400) > > > > > >> > > > Caused by: java.lang.ClassCastException: > > > > > >> > > org.apache.pig.data.DataByteArray > > > > > >> > > > cannot be cast to org.apache.pig.data.DataBag > > > > > >> > > > at > > > > > >> > > > > > > > > >> > > > > > > > > >> > > > > > > > >> > > > > > > >> > > > > > > > > > > > > > > > org.apache.pig.backend.hadoop.executionengine.physicalLayer.expressionOperators.POProject.processInputBag(POProject.java:586) > > > > > >> > > > at > > > > > >> > > > > > > > > >> > > > > > > > > >> > > > > > > > >> > > > > > > >> > > > > > > > > > > > > > > > org.apache.pig.backend.hadoop.executionengine.physicalLayer.expressionOperators.POProject.getNext(POProject.java:250) > > > > > >> > > > at > > > > > >> > > > > > > > > >> > > > > > > > > >> > > > > > > > >> > > > > > > >> > > > > > > > > > > > > > > > org.apache.pig.backend.hadoop.executionengine.physicalLayer.PhysicalOperator.getNext(PhysicalOperator.java:334) > > > > > >> > > > at > > > > > >> > > > > > > > > >> > > > > > > > > >> > > > > > > > >> > > > > > > >> > > > > > > > > > > > > > > > org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POForEach.processPlan(POForEach.java:372) > > > > > >> > > > at > > > > > >> > > > > > > > > >> > > > > > > > > >> > > > > > > > >> > > > > > > >> > > > > > > > > > > > > > > > org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POForEach.getNext(POForEach.java:297) > > > > > >> > > > at > > > > > >> > > > > > > > > >> > > > > > > > > >> > > > > > > > >> > > > > > > >> > > > > > > > > > > > > > > > org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigGenericMapBase.runPipeline(PigGenericMapBase.java:283) > > > > > >> > > > at > > > > > >> > > > > > > > > >> > > > > > > > > >> > > > > > > > >> > > > > > > >> > > > > > > > > > > > > > > > org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigGenericMapBase.map(PigGenericMapBase.java:278) > > > > > >> > > > at > > > > > >> > > > > > > > > >> > > > > > > > > >> > > > > > > > >> > > > > > > >> > > > > > > > > > > > > > > > org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigGenericMapBase.map(PigGenericMapBase.java:64) > > > > > >> > > > at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:144) > > > > > >> > > > at > > > > org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:725) > > > > > >> > > > at org.apache.hadoop.mapred.MapTask.run(MapTask.java:332) > > > > > >> > > > at > > > > > >> > > > > > > > > >> > > > > > > > > >> > > > > > > > >> > > > > > > >> > > > > > > > > > > > > > > > org.apache.hadoop.mapred.LocalJobRunner$Job$MapTaskRunnable.run(LocalJobRunner.java:232) > > > > > >> > > > at > > > > > >> > > > > > > > >> > > > > > java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:439) > > > > > >> > > > at > > > > > >> > java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303) > > > > > >> > > > at > java.util.concurrent.FutureTask.run(FutureTask.java:138) > > > > > >> > > > at > > > > > >> > > > > > > > > >> > > > > > > > > >> > > > > > > > >> > > > > > > >> > > > > > > > > > > > > > > > java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:895) > > > > > >> > > > at > > > > > >> > > > > > > > > >> > > > > > > > > >> > > > > > > > >> > > > > > > >> > > > > > > > > > > > > > > > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:918) > > > > > >> > > > at java.lang.Thread.run(Thread.java:680) > > > > > >> > > > > > > > > >> > > > Best Regards, > > > > > >> > > > > > > > > >> > > > Jerry > > > > > >> > > > > > > > > >> > > > > > > > > >> > > > On Wed, Apr 17, 2013 at 1:24 PM, Ruslan Al-Fakikh < > > > > > >> > [email protected] > > > > > >> > > > >wrote: > > > > > >> > > > > > > > > >> > > > > Hey, and as for converting a map of tuples, probably i > got > > > you > > > > > >> wrong. > > > > > >> > > If > > > > > >> > > > > you can get to every value manually withing FOREACH > then I > > > see > > > > > no > > > > > >> > > problem > > > > > >> > > > > in doing so. > > > > > >> > > > > > > > > > >> > > > > > > > > > >> > > > > On Wed, Apr 17, 2013 at 9:22 PM, Ruslan Al-Fakikh < > > > > > >> > > [email protected] > > > > > >> > > > > >wrote: > > > > > >> > > > > > > > > > >> > > > > > I am not sure whether you can convert a map to a > tuple. > > > > > >> > > > > > But I am curious about one thing: > > > > > >> > > > > > your are trying to use 'b' as a Bag, right? Because > > > FLATTEN > > > > > >> needs > > > > > >> > it > > > > > >> > > to > > > > > >> > > > > be > > > > > >> > > > > > a Bag I guess: > > > > > >> > > > > > http://pig.apache.org/docs/r0.10.0/basic.html#flatten > > > > > >> > > > > > But it seems that Pig thinks that b is a byte array: > > > > > >> > > > > > java.lang.ClassCastException: > > > > > org.apache.pig.data.DataByteArray > > > > > >> > > cannot > > > > > >> > > > be > > > > > >> > > > > > cast to org.apache.pig.data.DataBag > > > > > >> > > > > > Can you do this?: > > > > > >> > > > > > DESCRIBE B > > > > > >> > > > > > > > > > > >> > > > > > I suppose it can look like a Bag in the output of > DUMP, > > > but > > > > I > > > > > >> think > > > > > >> > > Pig > > > > > >> > > > > > doesn't know it is a Bag, maybe you'll need some kind > of > > > > > >> explicit > > > > > >> > > cast? > > > > > >> > > > > > > > > > > >> > > > > > > > > > > >> > > > > > On Wed, Apr 17, 2013 at 9:11 PM, Jerry Lam < > > > > > >> [email protected]> > > > > > >> > > > wrote: > > > > > >> > > > > > > > > > > >> > > > > >> Hi Rusian, > > > > > >> > > > > >> > > > > > >> > > > > >> I tried to debug each step already but no luck. > > > > > >> > > > > >> I did a dump (dump B;) after B = foreach A generate > > > > > >> document#'b' > > > > > >> > as > > > > > >> > > b; > > > > > >> > > > > >> I got {([c#11,d#22]),([c#33,d#44])} > > > > > >> > > > > >> but it fails when I did C = foreach B generate > > > flatten(b); > > > > > >> > > > > >> > > > > > >> > > > > >> I don't have controls over the input. It is passed as > > Map > > > > of > > > > > >> > Maps. I > > > > > >> > > > > guess > > > > > >> > > > > >> it makes lookup easier using a map with keys. > > > > > >> > > > > >> > > > > > >> > > > > >> Can I convert map to tuple? > > > > > >> > > > > >> > > > > > >> > > > > >> Best Regards, > > > > > >> > > > > >> > > > > > >> > > > > >> Jerry > > > > > >> > > > > >> > > > > > >> > > > > >> > > > > > >> > > > > >> > > > > > >> > > > > >> On Wed, Apr 17, 2013 at 11:57 AM, Ruslan Al-Fakikh < > > > > > >> > > > > [email protected] > > > > > >> > > > > >> >wrote: > > > > > >> > > > > >> > > > > > >> > > > > >> > Hi Jerry, > > > > > >> > > > > >> > > > > > > >> > > > > >> > I would recommend to debug the issue step by step. > > Just > > > > > after > > > > > >> > this > > > > > >> > > > > line: > > > > > >> > > > > >> > A = load 'data.txt' as document:[]; > > > > > >> > > > > >> > and then right after that: > > > > > >> > > > > >> > DESCRIBE A; > > > > > >> > > > > >> > DUMP A; > > > > > >> > > > > >> > and so on... > > > > > >> > > > > >> > > > > > > >> > > > > >> > To be honest I haven't used maps that much. Just > > > curious, > > > > > why > > > > > >> > did > > > > > >> > > > you > > > > > >> > > > > >> > choose to use them? You can also use regular tuples > > for > > > > > >> storing > > > > > >> > > the > > > > > >> > > > > >> > relations. Also you can store the tuples with a > > schema > > > > > file. > > > > > >> > > > > >> > > > > > > >> > > > > >> > Ruslan > > > > > >> > > > > >> > > > > > > >> > > > > >> > > > > > > >> > > > > >> > On Wed, Apr 17, 2013 at 5:28 AM, Jerry Lam < > > > > > >> > [email protected]> > > > > > >> > > > > >> wrote: > > > > > >> > > > > >> > > > > > > >> > > > > >> > > Hi pig users, > > > > > >> > > > > >> > > > > > > > >> > > > > >> > > I tried to load data using PigStorage that was > > > > previously > > > > > >> > stored > > > > > >> > > > > using > > > > > >> > > > > >> > > PigStorage but it failed. > > > > > >> > > > > >> > > > > > > > >> > > > > >> > > Each line looks like this in the data file that > is > > > > > >> generated > > > > > >> > by > > > > > >> > > > > >> > PigStorage: > > > > > >> > > > > >> > > [a#hello,b#{([c#11,d#22]),([c#33,d#44])}] > > > > > >> > > > > >> > > > > > > > >> > > > > >> > > I did the following: > > > > > >> > > > > >> > > A = load 'data.txt' as document:[]; > > > > > >> > > > > >> > > B = foreach A generate document#'b' as b; > > > > > >> > > > > >> > > C = foreach B generate flatten(b); > > > > > >> > > > > >> > > dump C; > > > > > >> > > > > >> > > > > > > > >> > > > > >> > > I expect to see the following output: > > > > > >> > > > > >> > > ([c#11,d#22]) > > > > > >> > > > > >> > > ([c#33,d#44]) > > > > > >> > > > > >> > > > > > > > >> > > > > >> > > Instead, I got: > > > > > >> > > > > >> > > java.lang.ClassCastException: > > > > > >> > org.apache.pig.data.DataByteArray > > > > > >> > > > > >> cannot be > > > > > >> > > > > >> > > cast to org.apache.pig.data.DataBag > > > > > >> > > > > >> > > > > > > > >> > > > > >> > > Anyone encounters this problem before? How can I > > read > > > > the > > > > > >> data > > > > > >> > > > back? > > > > > >> > > > > >> > > > > > > > >> > > > > >> > > Thanks, > > > > > >> > > > > >> > > > > > > > >> > > > > >> > > Jerry > > > > > >> > > > > >> > > > > > > > >> > > > > >> > > > > > > >> > > > > >> > > > > > >> > > > > > > > > > > >> > > > > > > > > > > >> > > > > > > > > > >> > > > > > > > > >> > > > > > > > >> > > > > > > >> > > > > > > > > > > > > > > > > > > > > > > > > > > >
