Re: About Multiple Join in Pig

mingda li Wed, 02 Nov 2016 23:34:45 -0700

Yeah, the log file's content is as following:

  1 Pig Stack Trace


  2 ---------------

  3 ERROR 1070: Could not resolve datafu.pig.hash.Hasher using imports: [,
java.lang., org.apache.pig.builtin., org.apache.pig.impl.builtin.]

  4

  5 Failed to parse: Pig script failed to parse:

  6 <line 3, column 27> Failed to generate logical plan. Nested exception:
org.apache.pig.backend.executionengine.ExecException: ERROR 1070: Could not
resolve datafu.pig.hash.Hasher using imports: [, java    .lang.,
org.apache.pig.builtin., org.apache.pig.impl.builtin.]

  7         at
org.apache.pig.parser.QueryParserDriver.parse(QueryParserDriver.java:199)

  8         at
org.apache.pig.PigServer$Graph.validateQuery(PigServer.java:1707)

  9         at
org.apache.pig.PigServer$Graph.registerQuery(PigServer.java:1680)

 10         at org.apache.pig.PigServer.registerQuery(PigServer.java:623)

 11         at
org.apache.pig.tools.grunt.GruntParser.processPig(GruntParser.java:1082)

 12         at
org.apache.pig.tools.pigscript.parser.PigScriptParser.parse(PigScriptParser.java:505)

 13         at
org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:230)

 14         at
org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:205)

 15         at org.apache.pig.tools.grunt.Grunt.run(Grunt.java:66)

 16         at org.apache.pig.Main.run(Main.java:565)

 17         at org.apache.pig.Main.main(Main.java:177)

 18         at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)

 19         at
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)

 20         at
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)

 21         at java.lang.reflect.Method.invoke(Method.java:606)

 22         at org.apache.hadoop.util.RunJar.main(RunJar.java:156)

 23 Caused by:

 24 <line 3, column 27> Failed to generate logical plan. Nested exception:
org.apache.pig.backend.executionengine.ExecException: ERROR 1070: Could not
resolve datafu.pig.hash.Hasher using imports: [, java    .lang.,
org.apache.pig.builtin., org.apache.pig.impl.builtin.]

 25         at
org.apache.pig.parser.LogicalPlanBuilder.buildUDF(LogicalPlanBuilder.java:1572)

 26         at
org.apache.pig.parser.LogicalPlanGenerator.func_eval(LogicalPlanGenerator.java:9403)

 27         at
org.apache.pig.parser.LogicalPlanGenerator.projectable_expr(LogicalPlanGenerator.java:11082)

 28         at
org.apache.pig.parser.LogicalPlanGenerator.var_expr(LogicalPlanGenerator.java:10841)

 29         at
org.apache.pig.parser.LogicalPlanGenerator.expr(LogicalPlanGenerator.java:10190)

 30         at
org.apache.pig.parser.LogicalPlanGenerator.flatten_generated_item(LogicalPlanGenerator.java:7519)

 31         at
org.apache.pig.parser.LogicalPlanGenerator.generate_clause(LogicalPlanGenerator.java:17621)

 32         at
org.apache.pig.parser.LogicalPlanGenerator.foreach_plan(LogicalPlanGenerator.java:16013)

 33         at
org.apache.pig.parser.LogicalPlanGenerator.foreach_clause(LogicalPlanGenerator.java:15880)

 34         at
org.apache.pig.parser.LogicalPlanGenerator.op_clause(LogicalPlanGenerator.java:1933)

 35         at
org.apache.pig.parser.LogicalPlanGenerator.general_statement(LogicalPlanGenerator.java:1102)

 36         at
org.apache.pig.parser.LogicalPlanGenerator.statement(LogicalPlanGenerator.java:560)

 37         at
org.apache.pig.parser.LogicalPlanGenerator.query(LogicalPlanGenerator.java:421)

 38         at
org.apache.pig.parser.QueryParserDriver.parse(QueryParserDriver.java:191)

 39         ... 15 more

 40 Caused by: org.apache.pig.backend.executionengine.ExecException: ERROR
1070: Could not resolve datafu.pig.hash.Hasher using imports: [,
java.lang., org.apache.pig.builtin., org.apache.pig.impl.builtin    .]

 41         at
org.apache.pig.impl.PigContext.resolveClassName(PigContext.java:677)

 42         at
org.apache.pig.impl.PigContext.getClassForAlias(PigContext.java:793)

 43         at
org.apache.pig.parser.LogicalPlanBuilder.buildUDF(LogicalPlanBuilder.java:1569)

 44         ... 28 more

 45 =========================


On Wed, Nov 2, 2016 at 11:27 PM, Debabrata Pani <android.p...@gmail.com>
wrote:

> Just to be doubly sure can you share the error inside the log file
> mentioned in the output ?
>
> On Nov 3, 2016 10:12, "mingda li" <limingda1...@gmail.com> wrote:
>
> > My query is as following:
> >
> > pig
> > -Dpig.additional.jars=/home/hadoop-user/pig-branch-0.lib/
> > datafu-pig-incubating-1.3.1.jar
> >
> >
> > To open pig.
> >
> > Then, input:
> >
> >
> > *REGISTER*
> > /home/hadoop-user/pig-branch-0.15/lib/datafu-pig-incubating-1.3.1.jar
> >
> > data = LOAD 'hdfs://SCAI01.CS.UCLA.EDU:9000/clash/datasets/1.txt' using
> > PigStorage() as (val:int);
> >
> > define MurmurH32   datafu.pig.hash.Hasher('murmur3-32');
> >
> > dat= FOREACH data GENERATE MurmurH32(val);
> >
> > On Wed, Nov 2, 2016 at 9:35 PM, mingda li <limingda1...@gmail.com>
> wrote:
> >
> > > En, thanks Debabrata, but actually, I register each time ( forget to
> tell
> > > you) before i run the commands.
> > > I use *REGISTER* /home/hadoop-user/pig-branch-0.15/lib/datafu-pig-
> > > incubating-1.3.1.jar.
> > > But cannot help me.
> > >
> > > Any other reason?
> > >
> > > Thanks
> > >
> > > On Wed, Nov 2, 2016 at 8:03 PM, Debabrata Pani <android.p...@gmail.com
> >
> > > wrote:
> > >
> > >> It says that pig could not find the class Hasher. Start grunt with
> > >> -Dpig.additional.jars (before other pig arguments) or do a "register"
> of
> > >> individual jars before typing in your scripts.
> > >>
> > >> Regards,
> > >> Debabrata
> > >>
> > >> On Nov 3, 2016 07:09, "mingda li" <limingda1...@gmail.com> wrote:
> > >>
> > >> > Thanks. I have tried to install the datafu and finish quickstart
> > >> > successfully http://datafu.incubator.apache.org/docs/quick-start.
> html
> > >> >
> > >> > But when i use the murmur hash, it failed. I do not know why.
> > >> >
> > >> > grunt>  data = LOAD 'hdfs://***.UCLA.EDU:9000/clash/datasets/1.txt'
> > >> using
> > >> > PigStorage() as (val:int);
> > >> >
> > >> > grunt> data_out = FOREACH data GENERATE val;
> > >> >
> > >> > grunt> dat= FOREACH data GENERATE MurmurH32(val);
> > >> >
> > >> > 2016-11-02 18:25:18,424 [main] ERROR org.apache.pig.tools.grunt.
> Grunt
> > -
> > >> > ERROR 1070: Could not resolve datafu.pig.hash.Hasher using imports:
> [,
> > >> > java.lang., org.apache.pig.builtin., org.apache.pig.impl.builtin.]
> > >> >
> > >> > Details at logfile: /home/hadoop-user/pig-branch-
> > >> > 0.15/bin/pig_1478136031217.log
> > >> >
> > >> >
> > >> > The log file is in attachment.
> > >> >
> > >> >
> > >> > Bests,
> > >> >
> > >> > Mingda
> > >> >
> > >> >
> > >> > On Wed, Nov 2, 2016 at 2:04 PM, Daniel Dai <da...@hortonworks.com>
> > >> wrote:
> > >> >
> > >> >> I see datafu has a patch for the UDF:
> https://issues.apache.org/jira
> > >> >> /browse/DATAFU-47
> > >> >>
> > >> >>
> > >> >>
> > >> >>
> > >> >> On 11/2/16, 11:45 AM, "mingda li" <limingda1...@gmail.com> wrote:
> > >> >>
> > >> >> >Dear all,
> > >> >> >
> > >> >> >Hi, now I wants to import a UDF function to pig command. Has
> anyone
> > >> ever
> > >> >> >done so? I want to import google's guava/murmur3_32 to pig. Could
> > >> anyone
> > >> >> >give some useful materials or suggestion？
> > >> >> >
> > >> >> >Bests,
> > >> >> >Mingda
> > >> >> >
> > >> >> >On Wed, Nov 2, 2016 at 2:11 AM, mingda li <limingda1...@gmail.com
> >
> > >> >> wrote:
> > >> >> >
> > >> >> >> Yeah, I see. Thanks for your reply.
> > >> >> >>
> > >> >> >> Bests,
> > >> >> >> Mingda
> > >> >> >>
> > >> >> >> On Tue, Nov 1, 2016 at 9:20 PM, Daniel Dai <
> da...@hortonworks.com
> > >
> > >> >> wrote:
> > >> >> >>
> > >> >> >>> Yes, you need to dump/store xxx_OrderRes to kick off the job.
> You
> > >> will
> > >> >> >>> see two MapReduce jobs corresponding to the first and second
> > join.
> > >> >> >>>
> > >> >> >>> Thanks,
> > >> >> >>> Daniel
> > >> >> >>>
> > >> >> >>>
> > >> >> >>>
> > >> >> >>> On 11/1/16, 10:52 AM, "mingda li" <limingda1...@gmail.com>
> > wrote:
> > >> >> >>>
> > >> >> >>> >Dear Dai,
> > >> >> >>> >
> > >> >> >>> >Thanks for your reply.
> > >> >> >>> >What I want to do is to compare the two different order of
> join.
> > >> The
> > >> >> >>> query
> > >> >> >>> >is as following:
> > >> >> >>> >
> > >> >> >>> >*Bad_OrderIn = JOIN inventory BY  inv_item_sk, catalog_sales
> BY
> > >> >> >>> cs_item_sk;*
> > >> >> >>> >*Bad_OrderRes = JOIN Bad_OrderIn  BY   (cs_item_sk,
> > >> cs_order_number),
> > >> >> >>> >catalog_returns BY (cr_item_sk, cr_order_number);*
> > >> >> >>> >*Dump or Store Bad_OrderRes;*
> > >> >> >>> >
> > >> >> >>> >*Good_OrderIn = JOIN catalog_returns BY (cr_item_sk,
> > >> >> cr_order_number),
> > >> >> >>> >catalog_sales BY (cs_item_sk, cs_order_number);*
> > >> >> >>> >*Good_OrderRes = JOIN Good_OrderIn  BY  cs_item_sk, inventory
> BY
> > >> >> >>> > inv_item_sk;*
> > >> >> >>> >*Dump or Store Good_OrderRes;*
> > >> >> >>> >
> > >> >> >>> >Since Pig execute the query lazily, I think only by Dump or
> > Store
> > >> the
> > >> >> >>> >result, I can know the time of MapReduce Job, is it right? If
> it
> > >> is,
> > >> >> >>> then I
> > >> >> >>> >need to count the time to Dump or Store the result as the time
> > for
> > >> >> the
> > >> >> >>> >different orders' join.
> > >> >> >>> >
> > >> >> >>> >Bests,
> > >> >> >>> >Mingda
> > >> >> >>> >
> > >> >> >>> >
> > >> >> >>> >
> > >> >> >>> >On Tue, Nov 1, 2016 at 10:39 AM, Daniel Dai <
> > >> da...@hortonworks.com>
> > >> >> >>> wrote:
> > >> >> >>> >
> > >> >> >>> >> Hi, Mingda,
> > >> >> >>> >>
> > >> >> >>> >> Pig does not do join reordering and will execute the query
> as
> > >> the
> > >> >> way
> > >> >> >>> it
> > >> >> >>> >> is written. Note you can join multiple relations in one join
> > >> >> statement.
> > >> >> >>> >>
> > >> >> >>> >> Do you want execution time for each join in your statement?
> I
> > >> >> assume
> > >> >> >>> you
> > >> >> >>> >> are using regular join and running with MapReduce, every
> join
> > >> >> statement
> > >> >> >>> >> will be a separate MapReduce job and the join runtime is the
> > >> >> runtime
> > >> >> >>> for
> > >> >> >>> >> its MapReduce job.
> > >> >> >>> >>
> > >> >> >>> >> Thanks,
> > >> >> >>> >> Daniel
> > >> >> >>> >>
> > >> >> >>> >>
> > >> >> >>> >>
> > >> >> >>> >> On 10/31/16, 8:21 PM, "mingda li" <limingda1...@gmail.com>
> > >> wrote:
> > >> >> >>> >>
> > >> >> >>> >> >Dear all,
> > >> >> >>> >> >
> > >> >> >>> >> >I am doing optimization for multiple join. I am not sure if
> > Pig
> > >> >> can
> > >> >> >>> decide
> > >> >> >>> >> >the join order in optimization layer. Does anyone know
> about
> > >> >> this? Or
> > >> >> >>> Pig
> > >> >> >>> >> >just execute the query as the way it is written.
> > >> >> >>> >> >
> > >> >> >>> >> >And, I want to do the multiple way Join on different keys.
> > Can
> > >> the
> > >> >> >>> >> >following query work?
> > >> >> >>> >> >
> > >> >> >>> >> >Res =
> > >> >> >>> >> >JOIN
> > >> >> >>> >> >(JOIN catalog_sales BY cs_item_sk, inventory BY
> inv_item_sk)
> > >> BY
> > >> >> >>> >> >(cs_item_sk, cs_order_number), catalog_returns BY
> > (cr_item_sk,
> > >> >> >>> >> >cr_order_number);
> > >> >> >>> >> >
> > >> >> >>> >> >BTW, each time, I run the query, it is finished in one
> > second.
> > >> Is
> > >> >> >>> there a
> > >> >> >>> >> >way to see the execution time? I have set the
> > >> >> pig.udf.profile=true.
> > >> >> >>> Where
> > >> >> >>> >> >can I find the time?
> > >> >> >>> >> >
> > >> >> >>> >> >Bests,
> > >> >> >>> >> >Mingda
> > >> >> >>> >>
> > >> >> >>>
> > >> >> >>
> > >> >> >>
> > >> >>
> > >> >
> > >> >
> > >>
> > >
> > >
> >
>

Re: About Multiple Join in Pig

Reply via email to