Hi Jerry,
Sorry I misled you in my suggestions a bit:)
As for your last question: it was interesting for me to investigate the
issue. Here is what I found:
https://issues.apache.org/jira/browse/PIG-2216
https://issues.apache.org/jira/browse/PIG-2315
So here
B = foreach A generate document#'b' as b:bag{};"
due to the misleading Pig syntax/behaviour you are not casting, just
renaming:(

Ruslan



On Fri, Apr 19, 2013 at 2:57 AM, Jerry Lam <[email protected]> wrote:

> Hi Prashant:
>
> Just trying to understand my mistake...
> I thought "B = foreach A generate document#'b' as b:bag{};" will cast
> bytearray to bag because of b:bag{}. If I understand correctly, this is not
> what I thought. Am I correct?
>
> Best Regards,
>
> Jerry
>
>
>
>
> On Thu, Apr 18, 2013 at 5:41 PM, Prashant Kommireddi <[email protected]
> >wrote:
>
> > Hi Jerry,
> >
> > Like I mentioned in my earlier email "Map values by default are
> bytearrays.
> > If you need them to be any other type, you would need to define it
> > explicitly."
> >
> > Difference in the 2 statements is one does a cast to "bag" and the other
> is
> > a bytearray (default).
> >
> >
> >
> >
> >
> >
> > On Thu, Apr 18, 2013 at 2:14 PM, Jerry Lam <[email protected]> wrote:
> >
> > > Hi Prashant:
> > >
> > > IT WORKS! THANKS!
> > > What is the difference between :
> > > "B = foreach A generate (bag{})document#'b' as b;
> > > and
> > > B = foreach A generate document#'b' as b:bag{};"
> > > ?
> > >
> > > The latter gives error: java.lang.ClassCastException:
> > > org.apache.pig.data.DataByteArray cannot be cast to
> > > org.apache.pig.data.DataBag
> > >
> > > Best Regards,
> > >
> > > Jerry
> > >
> > >
> > > On Thu, Apr 18, 2013 at 12:34 PM, Prashant Kommireddi
> > > <[email protected]>wrote:
> > >
> > > > Well, let me rephrase - the values all have to be the same type if
> you
> > > > choose to read all columns in a similar way. If you know in advance
> its
> > > > always the value associated with key 'b' that's a bag, why don't you
> > cast
> > > > that single value?
> > > >
> > > > B = foreach A generate (bag{})document#'b' as b;
> > > >
> > > >
> > > > On Thu, Apr 18, 2013 at 7:43 AM, Jerry Lam <[email protected]>
> > wrote:
> > > >
> > > > > Hi Prashant:
> > > > >
> > > > > I read about the map data type in the book "Programming Pig", it
> > says:
> > > > > "... By default there is no requirement that all values in a map
> must
> > > be
> > > > of
> > > > > the same type. It is legitimate to have a map with two keys name
> and
> > > age,
> > > > > where the value for name is a chararray and the value for age is an
> > > int.
> > > > > Beginning in Pig 0.9, a map can declare its values to all be of the
> > > same
> > > > > type... "
> > > > >
> > > > > I agree that all values in the map can be of the same type but this
> > is
> > > > not
> > > > > required in pig.
> > > > >
> > > > > Best Regards,
> > > > >
> > > > > Jerry
> > > > >
> > > > >
> > > > > On Thu, Apr 18, 2013 at 10:37 AM, Jerry Lam <[email protected]>
> > > > wrote:
> > > > >
> > > > > > Hi Rusian:
> > > > > >
> > > > > > I used PigStorage to store the data that is originally using Pig
> > data
> > > > > > type. It is strange (or a bug in Pig) that I cannot read the data
> > > using
> > > > > > PigStorage that have been stored using PigStorage, isn't it?
> > > > > >
> > > > > > Best Regards,
> > > > > >
> > > > > > Jerry
> > > > > >
> > > > > >
> > > > > >
> > > > > > On Wed, Apr 17, 2013 at 10:52 PM, Ruslan Al-Fakikh <
> > > > [email protected]
> > > > > >wrote:
> > > > > >
> > > > > >> The output:
> > > > > >> ({ ([c#11,d#22]),([c#33,d#44]) })
> > > > > >> ()
> > > > > >> looks weird.
> > > > > >>
> > > > > >> Jerry, maybe the problem is in using PigStorage. As its javadoc
> > > says:
> > > > > >>
> > > > > >> A load function that parses a line of input into fields using a
> > > > > character
> > > > > >> delimiter
> > > > > >>
> > > > > >> So I guess this is just for simple csv lines.
> > > > > >> But you are trying to load a complicated Map structure as it was
> > > > > formatted
> > > > > >> by previous storing.
> > > > > >> Probably you'll need to write your own Loader for this. Another
> > > hint:
> > > > > >> using
> > > > > >> the -schema paramenter to PigStorage, but I am not sure it can
> > > help:(
> > > > > >>
> > > > > >> Ruslan
> > > > > >>
> > > > > >>
> > > > > >> On Wed, Apr 17, 2013 at 11:48 PM, Jerry Lam <
> [email protected]
> > >
> > > > > wrote:
> > > > > >>
> > > > > >> > Hi Rusian:
> > > > > >> >
> > > > > >> > I did a describe B followed by a dump B, the output is:
> > > > > >> > B: {b: {()}}
> > > > > >> >
> > > > > >> > ({ ([c#11,d#22]),([c#33,d#44]) })
> > > > > >> > ()
> > > > > >> >
> > > > > >> > but when I executed
> > > > > >> >
> > > > > >> > C = foreach B generate flatten(b);
> > > > > >> >
> > > > > >> > dump C;
> > > > > >> >
> > > > > >> > I got the exception again...
> > > > > >> >
> > > > > >> > 2013-04-17 15:47:39,933 [Thread-26] WARN
> > > > > >> >  org.apache.hadoop.mapred.LocalJobRunner - job_local_0002
> > > > > >> > java.lang.Exception: java.lang.ClassCastException:
> > > > > >> > org.apache.pig.data.DataByteArray cannot be cast to
> > > > > >> > org.apache.pig.data.DataBag
> > > > > >> > at
> > > > > >>
> > > >
> > org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:400)
> > > > > >> > Caused by: java.lang.ClassCastException:
> > > > > >> org.apache.pig.data.DataByteArray
> > > > > >> > cannot be cast to org.apache.pig.data.DataBag
> > > > > >> > at
> > > > > >> >
> > > > > >> >
> > > > > >>
> > > > >
> > > >
> > >
> >
> org.apache.pig.backend.hadoop.executionengine.physicalLayer.expressionOperators.POProject.processInputBag(POProject.java:586)
> > > > > >> > at
> > > > > >> >
> > > > > >> >
> > > > > >>
> > > > >
> > > >
> > >
> >
> org.apache.pig.backend.hadoop.executionengine.physicalLayer.expressionOperators.POProject.getNext(POProject.java:250)
> > > > > >> > at
> > > > > >> >
> > > > > >> >
> > > > > >>
> > > > >
> > > >
> > >
> >
> org.apache.pig.backend.hadoop.executionengine.physicalLayer.PhysicalOperator.getNext(PhysicalOperator.java:334)
> > > > > >> > at
> > > > > >> >
> > > > > >> >
> > > > > >>
> > > > >
> > > >
> > >
> >
> org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POForEach.processPlan(POForEach.java:372)
> > > > > >> > at
> > > > > >> >
> > > > > >> >
> > > > > >>
> > > > >
> > > >
> > >
> >
> org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POForEach.getNext(POForEach.java:297)
> > > > > >> > at
> > > > > >> >
> > > > > >> >
> > > > > >>
> > > > >
> > > >
> > >
> >
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigGenericMapBase.runPipeline(PigGenericMapBase.java:283)
> > > > > >> > at
> > > > > >> >
> > > > > >> >
> > > > > >>
> > > > >
> > > >
> > >
> >
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigGenericMapBase.map(PigGenericMapBase.java:278)
> > > > > >> > at
> > > > > >> >
> > > > > >> >
> > > > > >>
> > > > >
> > > >
> > >
> >
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigGenericMapBase.map(PigGenericMapBase.java:64)
> > > > > >> > at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:144)
> > > > > >> > at
> > org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:725)
> > > > > >> > at org.apache.hadoop.mapred.MapTask.run(MapTask.java:332)
> > > > > >> > at
> > > > > >> >
> > > > > >> >
> > > > > >>
> > > > >
> > > >
> > >
> >
> org.apache.hadoop.mapred.LocalJobRunner$Job$MapTaskRunnable.run(LocalJobRunner.java:232)
> > > > > >> > at
> > > > > >>
> > > >
> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:439)
> > > > > >> > at
> > > > java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303)
> > > > > >> > at java.util.concurrent.FutureTask.run(FutureTask.java:138)
> > > > > >> > at
> > > > > >> >
> > > > > >> >
> > > > > >>
> > > > >
> > > >
> > >
> >
> java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:895)
> > > > > >> > at
> > > > > >> >
> > > > > >> >
> > > > > >>
> > > > >
> > > >
> > >
> >
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:918)
> > > > > >> > at java.lang.Thread.run(Thread.java:680)
> > > > > >> >
> > > > > >> >
> > > > > >> > Best Regards,
> > > > > >> >
> > > > > >> > Jerry
> > > > > >> >
> > > > > >> >
> > > > > >> > On Wed, Apr 17, 2013 at 3:26 PM, Ruslan Al-Fakikh <
> > > > > [email protected]
> > > > > >> > >wrote:
> > > > > >> >
> > > > > >> > > I think that before doing the FLATTEN, you should be 100%
> sure
> > > > that
> > > > > >> your
> > > > > >> > > cast worked properly. Can you first DESCRIBE B and then
> DUMP B
> > > > right
> > > > > >> > away?
> > > > > >> > > Or probably it just can't be cast in this way. Honestly I
> > don't
> > > > know
> > > > > >> > > exactly how it works, but here:
> > > > > >> > > http://pig.apache.org/docs/r0.10.0/basic.html#cast
> > > > > >> > > I see that casting from a map to a bag should produce an
> > error.
> > > > > >> > > Hope that helps.
> > > > > >> > >
> > > > > >> > >
> > > > > >> > > On Wed, Apr 17, 2013 at 9:38 PM, Jerry Lam <
> > > [email protected]>
> > > > > >> wrote:
> > > > > >> > >
> > > > > >> > > > Hi Rusian:
> > > > > >> > > >
> > > > > >> > > > Thanks for your help. I really appreciate it. It really
> > > puzzled
> > > > > me.
> > > > > >> > > >
> > > > > >> > > > I did a "describe B", the output is "B: {b: bytearray}".
> > > > > >> > > >
> > > > > >> > > > I then tried to cast it as suggested, I got:
> > > > > >> > > > B = foreach A generate document#'b' as b:{};
> > > > > >> > > > describe B;
> > > > > >> > > > B: {b: {()}}
> > > > > >> > > >
> > > > > >> > > > Then I proceed with:
> > > > > >> > > > C = foreach B generate flatten(b);
> > > > > >> > > >
> > > > > >> > > > I got:
> > > > > >> > > > 2013-04-17 13:38:04,601 [Thread-16] WARN
> > > > > >> > > >  org.apache.hadoop.mapred.LocalJobRunner - job_local_0002
> > > > > >> > > > java.lang.Exception: java.lang.ClassCastException:
> > > > > >> > > > org.apache.pig.data.DataByteArray cannot be cast to
> > > > > >> > > > org.apache.pig.data.DataBag
> > > > > >> > > > at
> > > > > >> > >
> > > > > >>
> > > >
> > org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:400)
> > > > > >> > > > Caused by: java.lang.ClassCastException:
> > > > > >> > > org.apache.pig.data.DataByteArray
> > > > > >> > > > cannot be cast to org.apache.pig.data.DataBag
> > > > > >> > > > at
> > > > > >> > > >
> > > > > >> > > >
> > > > > >> > >
> > > > > >> >
> > > > > >>
> > > > >
> > > >
> > >
> >
> org.apache.pig.backend.hadoop.executionengine.physicalLayer.expressionOperators.POProject.processInputBag(POProject.java:586)
> > > > > >> > > > at
> > > > > >> > > >
> > > > > >> > > >
> > > > > >> > >
> > > > > >> >
> > > > > >>
> > > > >
> > > >
> > >
> >
> org.apache.pig.backend.hadoop.executionengine.physicalLayer.expressionOperators.POProject.getNext(POProject.java:250)
> > > > > >> > > > at
> > > > > >> > > >
> > > > > >> > > >
> > > > > >> > >
> > > > > >> >
> > > > > >>
> > > > >
> > > >
> > >
> >
> org.apache.pig.backend.hadoop.executionengine.physicalLayer.PhysicalOperator.getNext(PhysicalOperator.java:334)
> > > > > >> > > > at
> > > > > >> > > >
> > > > > >> > > >
> > > > > >> > >
> > > > > >> >
> > > > > >>
> > > > >
> > > >
> > >
> >
> org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POForEach.processPlan(POForEach.java:372)
> > > > > >> > > > at
> > > > > >> > > >
> > > > > >> > > >
> > > > > >> > >
> > > > > >> >
> > > > > >>
> > > > >
> > > >
> > >
> >
> org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POForEach.getNext(POForEach.java:297)
> > > > > >> > > > at
> > > > > >> > > >
> > > > > >> > > >
> > > > > >> > >
> > > > > >> >
> > > > > >>
> > > > >
> > > >
> > >
> >
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigGenericMapBase.runPipeline(PigGenericMapBase.java:283)
> > > > > >> > > > at
> > > > > >> > > >
> > > > > >> > > >
> > > > > >> > >
> > > > > >> >
> > > > > >>
> > > > >
> > > >
> > >
> >
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigGenericMapBase.map(PigGenericMapBase.java:278)
> > > > > >> > > > at
> > > > > >> > > >
> > > > > >> > > >
> > > > > >> > >
> > > > > >> >
> > > > > >>
> > > > >
> > > >
> > >
> >
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigGenericMapBase.map(PigGenericMapBase.java:64)
> > > > > >> > > > at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:144)
> > > > > >> > > > at
> > > > org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:725)
> > > > > >> > > > at org.apache.hadoop.mapred.MapTask.run(MapTask.java:332)
> > > > > >> > > > at
> > > > > >> > > >
> > > > > >> > > >
> > > > > >> > >
> > > > > >> >
> > > > > >>
> > > > >
> > > >
> > >
> >
> org.apache.hadoop.mapred.LocalJobRunner$Job$MapTaskRunnable.run(LocalJobRunner.java:232)
> > > > > >> > > > at
> > > > > >> > >
> > > > > >>
> > > >
> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:439)
> > > > > >> > > > at
> > > > > >>
> java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303)
> > > > > >> > > > at
> java.util.concurrent.FutureTask.run(FutureTask.java:138)
> > > > > >> > > > at
> > > > > >> > > >
> > > > > >> > > >
> > > > > >> > >
> > > > > >> >
> > > > > >>
> > > > >
> > > >
> > >
> >
> java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:895)
> > > > > >> > > > at
> > > > > >> > > >
> > > > > >> > > >
> > > > > >> > >
> > > > > >> >
> > > > > >>
> > > > >
> > > >
> > >
> >
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:918)
> > > > > >> > > > at java.lang.Thread.run(Thread.java:680)
> > > > > >> > > >
> > > > > >> > > > Best Regards,
> > > > > >> > > >
> > > > > >> > > > Jerry
> > > > > >> > > >
> > > > > >> > > >
> > > > > >> > > > On Wed, Apr 17, 2013 at 1:24 PM, Ruslan Al-Fakikh <
> > > > > >> > [email protected]
> > > > > >> > > > >wrote:
> > > > > >> > > >
> > > > > >> > > > > Hey, and as for converting a map of tuples, probably i
> got
> > > you
> > > > > >> wrong.
> > > > > >> > > If
> > > > > >> > > > > you can get to every value manually withing FOREACH
> then I
> > > see
> > > > > no
> > > > > >> > > problem
> > > > > >> > > > > in doing so.
> > > > > >> > > > >
> > > > > >> > > > >
> > > > > >> > > > > On Wed, Apr 17, 2013 at 9:22 PM, Ruslan Al-Fakikh <
> > > > > >> > > [email protected]
> > > > > >> > > > > >wrote:
> > > > > >> > > > >
> > > > > >> > > > > > I am not sure whether you can convert a map to a
> tuple.
> > > > > >> > > > > > But I am curious about one thing:
> > > > > >> > > > > > your are trying to use 'b' as a Bag, right? Because
> > > FLATTEN
> > > > > >> needs
> > > > > >> > it
> > > > > >> > > to
> > > > > >> > > > > be
> > > > > >> > > > > > a Bag I guess:
> > > > > >> > > > > > http://pig.apache.org/docs/r0.10.0/basic.html#flatten
> > > > > >> > > > > > But it seems that Pig thinks that b is a byte array:
> > > > > >> > > > > > java.lang.ClassCastException:
> > > > > org.apache.pig.data.DataByteArray
> > > > > >> > > cannot
> > > > > >> > > > be
> > > > > >> > > > > > cast to org.apache.pig.data.DataBag
> > > > > >> > > > > > Can you do this?:
> > > > > >> > > > > > DESCRIBE B
> > > > > >> > > > > >
> > > > > >> > > > > > I suppose it can look like a Bag in the output of
> DUMP,
> > > but
> > > > I
> > > > > >> think
> > > > > >> > > Pig
> > > > > >> > > > > > doesn't know it is a Bag, maybe you'll need some kind
> of
> > > > > >> explicit
> > > > > >> > > cast?
> > > > > >> > > > > >
> > > > > >> > > > > >
> > > > > >> > > > > > On Wed, Apr 17, 2013 at 9:11 PM, Jerry Lam <
> > > > > >> [email protected]>
> > > > > >> > > > wrote:
> > > > > >> > > > > >
> > > > > >> > > > > >> Hi Rusian,
> > > > > >> > > > > >>
> > > > > >> > > > > >> I tried to debug each step already but no luck.
> > > > > >> > > > > >> I did a dump (dump B;) after B = foreach A generate
> > > > > >> document#'b'
> > > > > >> > as
> > > > > >> > > b;
> > > > > >> > > > > >> I got {([c#11,d#22]),([c#33,d#44])}
> > > > > >> > > > > >> but it fails when I did C = foreach B generate
> > > flatten(b);
> > > > > >> > > > > >>
> > > > > >> > > > > >> I don't have controls over the input. It is passed as
> > Map
> > > > of
> > > > > >> > Maps. I
> > > > > >> > > > > guess
> > > > > >> > > > > >> it makes lookup easier using a map with keys.
> > > > > >> > > > > >>
> > > > > >> > > > > >> Can I convert map to tuple?
> > > > > >> > > > > >>
> > > > > >> > > > > >> Best Regards,
> > > > > >> > > > > >>
> > > > > >> > > > > >> Jerry
> > > > > >> > > > > >>
> > > > > >> > > > > >>
> > > > > >> > > > > >>
> > > > > >> > > > > >> On Wed, Apr 17, 2013 at 11:57 AM, Ruslan Al-Fakikh <
> > > > > >> > > > > [email protected]
> > > > > >> > > > > >> >wrote:
> > > > > >> > > > > >>
> > > > > >> > > > > >> > Hi Jerry,
> > > > > >> > > > > >> >
> > > > > >> > > > > >> > I would recommend to debug the issue step by step.
> > Just
> > > > > after
> > > > > >> > this
> > > > > >> > > > > line:
> > > > > >> > > > > >> > A = load 'data.txt' as document:[];
> > > > > >> > > > > >> > and then right after that:
> > > > > >> > > > > >> > DESCRIBE A;
> > > > > >> > > > > >> > DUMP A;
> > > > > >> > > > > >> > and so on...
> > > > > >> > > > > >> >
> > > > > >> > > > > >> > To be honest I haven't used maps that much. Just
> > > curious,
> > > > > why
> > > > > >> > did
> > > > > >> > > > you
> > > > > >> > > > > >> > choose to use them? You can also use regular tuples
> > for
> > > > > >> storing
> > > > > >> > > the
> > > > > >> > > > > >> > relations. Also you can store the tuples with a
> > schema
> > > > > file.
> > > > > >> > > > > >> >
> > > > > >> > > > > >> > Ruslan
> > > > > >> > > > > >> >
> > > > > >> > > > > >> >
> > > > > >> > > > > >> > On Wed, Apr 17, 2013 at 5:28 AM, Jerry Lam <
> > > > > >> > [email protected]>
> > > > > >> > > > > >> wrote:
> > > > > >> > > > > >> >
> > > > > >> > > > > >> > > Hi pig users,
> > > > > >> > > > > >> > >
> > > > > >> > > > > >> > > I tried to load data using PigStorage that was
> > > > previously
> > > > > >> > stored
> > > > > >> > > > > using
> > > > > >> > > > > >> > > PigStorage but it failed.
> > > > > >> > > > > >> > >
> > > > > >> > > > > >> > > Each line looks like this in the data file that
> is
> > > > > >> generated
> > > > > >> > by
> > > > > >> > > > > >> > PigStorage:
> > > > > >> > > > > >> > > [a#hello,b#{([c#11,d#22]),([c#33,d#44])}]
> > > > > >> > > > > >> > >
> > > > > >> > > > > >> > > I did the following:
> > > > > >> > > > > >> > > A = load 'data.txt' as document:[];
> > > > > >> > > > > >> > > B = foreach A generate document#'b' as b;
> > > > > >> > > > > >> > > C = foreach B generate flatten(b);
> > > > > >> > > > > >> > > dump C;
> > > > > >> > > > > >> > >
> > > > > >> > > > > >> > > I expect to see the following output:
> > > > > >> > > > > >> > > ([c#11,d#22])
> > > > > >> > > > > >> > > ([c#33,d#44])
> > > > > >> > > > > >> > >
> > > > > >> > > > > >> > > Instead, I got:
> > > > > >> > > > > >> > > java.lang.ClassCastException:
> > > > > >> > org.apache.pig.data.DataByteArray
> > > > > >> > > > > >> cannot be
> > > > > >> > > > > >> > > cast to org.apache.pig.data.DataBag
> > > > > >> > > > > >> > >
> > > > > >> > > > > >> > > Anyone encounters this problem before? How can I
> > read
> > > > the
> > > > > >> data
> > > > > >> > > > back?
> > > > > >> > > > > >> > >
> > > > > >> > > > > >> > > Thanks,
> > > > > >> > > > > >> > >
> > > > > >> > > > > >> > > Jerry
> > > > > >> > > > > >> > >
> > > > > >> > > > > >> >
> > > > > >> > > > > >>
> > > > > >> > > > > >
> > > > > >> > > > > >
> > > > > >> > > > >
> > > > > >> > > >
> > > > > >> > >
> > > > > >> >
> > > > > >>
> > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
>

Reply via email to