Hi Rusian:
Thanks for your help. I really appreciate it. It really puzzled me.
I did a "describe B", the output is "B: {b: bytearray}".
I then tried to cast it as suggested, I got:
B = foreach A generate document#'b' as b:{};
describe B;
B: {b: {()}}
Then I proceed with:
C = foreach B generate flatten(b);
I got:
2013-04-17 13:38:04,601 [Thread-16] WARN
org.apache.hadoop.mapred.LocalJobRunner - job_local_0002
java.lang.Exception: java.lang.ClassCastException:
org.apache.pig.data.DataByteArray cannot be cast to
org.apache.pig.data.DataBag
at org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:400)
Caused by: java.lang.ClassCastException: org.apache.pig.data.DataByteArray
cannot be cast to org.apache.pig.data.DataBag
at
org.apache.pig.backend.hadoop.executionengine.physicalLayer.expressionOperators.POProject.processInputBag(POProject.java:586)
at
org.apache.pig.backend.hadoop.executionengine.physicalLayer.expressionOperators.POProject.getNext(POProject.java:250)
at
org.apache.pig.backend.hadoop.executionengine.physicalLayer.PhysicalOperator.getNext(PhysicalOperator.java:334)
at
org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POForEach.processPlan(POForEach.java:372)
at
org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POForEach.getNext(POForEach.java:297)
at
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigGenericMapBase.runPipeline(PigGenericMapBase.java:283)
at
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigGenericMapBase.map(PigGenericMapBase.java:278)
at
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigGenericMapBase.map(PigGenericMapBase.java:64)
at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:144)
at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:725)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:332)
at
org.apache.hadoop.mapred.LocalJobRunner$Job$MapTaskRunnable.run(LocalJobRunner.java:232)
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:439)
at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303)
at java.util.concurrent.FutureTask.run(FutureTask.java:138)
at
java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:895)
at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:918)
at java.lang.Thread.run(Thread.java:680)
Best Regards,
Jerry
On Wed, Apr 17, 2013 at 1:24 PM, Ruslan Al-Fakikh <[email protected]>wrote:
> Hey, and as for converting a map of tuples, probably i got you wrong. If
> you can get to every value manually withing FOREACH then I see no problem
> in doing so.
>
>
> On Wed, Apr 17, 2013 at 9:22 PM, Ruslan Al-Fakikh <[email protected]
> >wrote:
>
> > I am not sure whether you can convert a map to a tuple.
> > But I am curious about one thing:
> > your are trying to use 'b' as a Bag, right? Because FLATTEN needs it to
> be
> > a Bag I guess:
> > http://pig.apache.org/docs/r0.10.0/basic.html#flatten
> > But it seems that Pig thinks that b is a byte array:
> > java.lang.ClassCastException: org.apache.pig.data.DataByteArray cannot be
> > cast to org.apache.pig.data.DataBag
> > Can you do this?:
> > DESCRIBE B
> >
> > I suppose it can look like a Bag in the output of DUMP, but I think Pig
> > doesn't know it is a Bag, maybe you'll need some kind of explicit cast?
> >
> >
> > On Wed, Apr 17, 2013 at 9:11 PM, Jerry Lam <[email protected]> wrote:
> >
> >> Hi Rusian,
> >>
> >> I tried to debug each step already but no luck.
> >> I did a dump (dump B;) after B = foreach A generate document#'b' as b;
> >> I got {([c#11,d#22]),([c#33,d#44])}
> >> but it fails when I did C = foreach B generate flatten(b);
> >>
> >> I don't have controls over the input. It is passed as Map of Maps. I
> guess
> >> it makes lookup easier using a map with keys.
> >>
> >> Can I convert map to tuple?
> >>
> >> Best Regards,
> >>
> >> Jerry
> >>
> >>
> >>
> >> On Wed, Apr 17, 2013 at 11:57 AM, Ruslan Al-Fakikh <
> [email protected]
> >> >wrote:
> >>
> >> > Hi Jerry,
> >> >
> >> > I would recommend to debug the issue step by step. Just after this
> line:
> >> > A = load 'data.txt' as document:[];
> >> > and then right after that:
> >> > DESCRIBE A;
> >> > DUMP A;
> >> > and so on...
> >> >
> >> > To be honest I haven't used maps that much. Just curious, why did you
> >> > choose to use them? You can also use regular tuples for storing the
> >> > relations. Also you can store the tuples with a schema file.
> >> >
> >> > Ruslan
> >> >
> >> >
> >> > On Wed, Apr 17, 2013 at 5:28 AM, Jerry Lam <[email protected]>
> >> wrote:
> >> >
> >> > > Hi pig users,
> >> > >
> >> > > I tried to load data using PigStorage that was previously stored
> using
> >> > > PigStorage but it failed.
> >> > >
> >> > > Each line looks like this in the data file that is generated by
> >> > PigStorage:
> >> > > [a#hello,b#{([c#11,d#22]),([c#33,d#44])}]
> >> > >
> >> > > I did the following:
> >> > > A = load 'data.txt' as document:[];
> >> > > B = foreach A generate document#'b' as b;
> >> > > C = foreach B generate flatten(b);
> >> > > dump C;
> >> > >
> >> > > I expect to see the following output:
> >> > > ([c#11,d#22])
> >> > > ([c#33,d#44])
> >> > >
> >> > > Instead, I got:
> >> > > java.lang.ClassCastException: org.apache.pig.data.DataByteArray
> >> cannot be
> >> > > cast to org.apache.pig.data.DataBag
> >> > >
> >> > > Anyone encounters this problem before? How can I read the data back?
> >> > >
> >> > > Thanks,
> >> > >
> >> > > Jerry
> >> > >
> >> >
> >>
> >
> >
>