Yeah, the log file's content is as following: 1 Pig Stack Trace
2 --------------- 3 ERROR 1070: Could not resolve datafu.pig.hash.Hasher using imports: [, java.lang., org.apache.pig.builtin., org.apache.pig.impl.builtin.] 4 5 Failed to parse: Pig script failed to parse: 6 <line 3, column 27> Failed to generate logical plan. Nested exception: org.apache.pig.backend.executionengine.ExecException: ERROR 1070: Could not resolve datafu.pig.hash.Hasher using imports: [, java .lang., org.apache.pig.builtin., org.apache.pig.impl.builtin.] 7 at org.apache.pig.parser.QueryParserDriver.parse(QueryParserDriver.java:199) 8 at org.apache.pig.PigServer$Graph.validateQuery(PigServer.java:1707) 9 at org.apache.pig.PigServer$Graph.registerQuery(PigServer.java:1680) 10 at org.apache.pig.PigServer.registerQuery(PigServer.java:623) 11 at org.apache.pig.tools.grunt.GruntParser.processPig(GruntParser.java:1082) 12 at org.apache.pig.tools.pigscript.parser.PigScriptParser.parse(PigScriptParser.java:505) 13 at org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:230) 14 at org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:205) 15 at org.apache.pig.tools.grunt.Grunt.run(Grunt.java:66) 16 at org.apache.pig.Main.run(Main.java:565) 17 at org.apache.pig.Main.main(Main.java:177) 18 at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) 19 at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) 20 at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) 21 at java.lang.reflect.Method.invoke(Method.java:606) 22 at org.apache.hadoop.util.RunJar.main(RunJar.java:156) 23 Caused by: 24 <line 3, column 27> Failed to generate logical plan. Nested exception: org.apache.pig.backend.executionengine.ExecException: ERROR 1070: Could not resolve datafu.pig.hash.Hasher using imports: [, java .lang., org.apache.pig.builtin., org.apache.pig.impl.builtin.] 25 at org.apache.pig.parser.LogicalPlanBuilder.buildUDF(LogicalPlanBuilder.java:1572) 26 at org.apache.pig.parser.LogicalPlanGenerator.func_eval(LogicalPlanGenerator.java:9403) 27 at org.apache.pig.parser.LogicalPlanGenerator.projectable_expr(LogicalPlanGenerator.java:11082) 28 at org.apache.pig.parser.LogicalPlanGenerator.var_expr(LogicalPlanGenerator.java:10841) 29 at org.apache.pig.parser.LogicalPlanGenerator.expr(LogicalPlanGenerator.java:10190) 30 at org.apache.pig.parser.LogicalPlanGenerator.flatten_generated_item(LogicalPlanGenerator.java:7519) 31 at org.apache.pig.parser.LogicalPlanGenerator.generate_clause(LogicalPlanGenerator.java:17621) 32 at org.apache.pig.parser.LogicalPlanGenerator.foreach_plan(LogicalPlanGenerator.java:16013) 33 at org.apache.pig.parser.LogicalPlanGenerator.foreach_clause(LogicalPlanGenerator.java:15880) 34 at org.apache.pig.parser.LogicalPlanGenerator.op_clause(LogicalPlanGenerator.java:1933) 35 at org.apache.pig.parser.LogicalPlanGenerator.general_statement(LogicalPlanGenerator.java:1102) 36 at org.apache.pig.parser.LogicalPlanGenerator.statement(LogicalPlanGenerator.java:560) 37 at org.apache.pig.parser.LogicalPlanGenerator.query(LogicalPlanGenerator.java:421) 38 at org.apache.pig.parser.QueryParserDriver.parse(QueryParserDriver.java:191) 39 ... 15 more 40 Caused by: org.apache.pig.backend.executionengine.ExecException: ERROR 1070: Could not resolve datafu.pig.hash.Hasher using imports: [, java.lang., org.apache.pig.builtin., org.apache.pig.impl.builtin .] 41 at org.apache.pig.impl.PigContext.resolveClassName(PigContext.java:677) 42 at org.apache.pig.impl.PigContext.getClassForAlias(PigContext.java:793) 43 at org.apache.pig.parser.LogicalPlanBuilder.buildUDF(LogicalPlanBuilder.java:1569) 44 ... 28 more 45 ========================= On Wed, Nov 2, 2016 at 11:27 PM, Debabrata Pani <android.p...@gmail.com> wrote: > Just to be doubly sure can you share the error inside the log file > mentioned in the output ? > > On Nov 3, 2016 10:12, "mingda li" <limingda1...@gmail.com> wrote: > > > My query is as following: > > > > pig > > -Dpig.additional.jars=/home/hadoop-user/pig-branch-0.lib/ > > datafu-pig-incubating-1.3.1.jar > > > > > > To open pig. > > > > Then, input: > > > > > > *REGISTER* > > /home/hadoop-user/pig-branch-0.15/lib/datafu-pig-incubating-1.3.1.jar > > > > data = LOAD 'hdfs://SCAI01.CS.UCLA.EDU:9000/clash/datasets/1.txt' using > > PigStorage() as (val:int); > > > > define MurmurH32 datafu.pig.hash.Hasher('murmur3-32'); > > > > dat= FOREACH data GENERATE MurmurH32(val); > > > > On Wed, Nov 2, 2016 at 9:35 PM, mingda li <limingda1...@gmail.com> > wrote: > > > > > En, thanks Debabrata, but actually, I register each time ( forget to > tell > > > you) before i run the commands. > > > I use *REGISTER* /home/hadoop-user/pig-branch-0.15/lib/datafu-pig- > > > incubating-1.3.1.jar. > > > But cannot help me. > > > > > > Any other reason? > > > > > > Thanks > > > > > > On Wed, Nov 2, 2016 at 8:03 PM, Debabrata Pani <android.p...@gmail.com > > > > > wrote: > > > > > >> It says that pig could not find the class Hasher. Start grunt with > > >> -Dpig.additional.jars (before other pig arguments) or do a "register" > of > > >> individual jars before typing in your scripts. > > >> > > >> Regards, > > >> Debabrata > > >> > > >> On Nov 3, 2016 07:09, "mingda li" <limingda1...@gmail.com> wrote: > > >> > > >> > Thanks. I have tried to install the datafu and finish quickstart > > >> > successfully http://datafu.incubator.apache.org/docs/quick-start. > html > > >> > > > >> > But when i use the murmur hash, it failed. I do not know why. > > >> > > > >> > grunt> data = LOAD 'hdfs://***.UCLA.EDU:9000/clash/datasets/1.txt' > > >> using > > >> > PigStorage() as (val:int); > > >> > > > >> > grunt> data_out = FOREACH data GENERATE val; > > >> > > > >> > grunt> dat= FOREACH data GENERATE MurmurH32(val); > > >> > > > >> > 2016-11-02 18:25:18,424 [main] ERROR org.apache.pig.tools.grunt. > Grunt > > - > > >> > ERROR 1070: Could not resolve datafu.pig.hash.Hasher using imports: > [, > > >> > java.lang., org.apache.pig.builtin., org.apache.pig.impl.builtin.] > > >> > > > >> > Details at logfile: /home/hadoop-user/pig-branch- > > >> > 0.15/bin/pig_1478136031217.log > > >> > > > >> > > > >> > The log file is in attachment. > > >> > > > >> > > > >> > Bests, > > >> > > > >> > Mingda > > >> > > > >> > > > >> > On Wed, Nov 2, 2016 at 2:04 PM, Daniel Dai <da...@hortonworks.com> > > >> wrote: > > >> > > > >> >> I see datafu has a patch for the UDF: > https://issues.apache.org/jira > > >> >> /browse/DATAFU-47 > > >> >> > > >> >> > > >> >> > > >> >> > > >> >> On 11/2/16, 11:45 AM, "mingda li" <limingda1...@gmail.com> wrote: > > >> >> > > >> >> >Dear all, > > >> >> > > > >> >> >Hi, now I wants to import a UDF function to pig command. Has > anyone > > >> ever > > >> >> >done so? I want to import google's guava/murmur3_32 to pig. Could > > >> anyone > > >> >> >give some useful materials or suggestion? > > >> >> > > > >> >> >Bests, > > >> >> >Mingda > > >> >> > > > >> >> >On Wed, Nov 2, 2016 at 2:11 AM, mingda li <limingda1...@gmail.com > > > > >> >> wrote: > > >> >> > > > >> >> >> Yeah, I see. Thanks for your reply. > > >> >> >> > > >> >> >> Bests, > > >> >> >> Mingda > > >> >> >> > > >> >> >> On Tue, Nov 1, 2016 at 9:20 PM, Daniel Dai < > da...@hortonworks.com > > > > > >> >> wrote: > > >> >> >> > > >> >> >>> Yes, you need to dump/store xxx_OrderRes to kick off the job. > You > > >> will > > >> >> >>> see two MapReduce jobs corresponding to the first and second > > join. > > >> >> >>> > > >> >> >>> Thanks, > > >> >> >>> Daniel > > >> >> >>> > > >> >> >>> > > >> >> >>> > > >> >> >>> On 11/1/16, 10:52 AM, "mingda li" <limingda1...@gmail.com> > > wrote: > > >> >> >>> > > >> >> >>> >Dear Dai, > > >> >> >>> > > > >> >> >>> >Thanks for your reply. > > >> >> >>> >What I want to do is to compare the two different order of > join. > > >> The > > >> >> >>> query > > >> >> >>> >is as following: > > >> >> >>> > > > >> >> >>> >*Bad_OrderIn = JOIN inventory BY inv_item_sk, catalog_sales > BY > > >> >> >>> cs_item_sk;* > > >> >> >>> >*Bad_OrderRes = JOIN Bad_OrderIn BY (cs_item_sk, > > >> cs_order_number), > > >> >> >>> >catalog_returns BY (cr_item_sk, cr_order_number);* > > >> >> >>> >*Dump or Store Bad_OrderRes;* > > >> >> >>> > > > >> >> >>> >*Good_OrderIn = JOIN catalog_returns BY (cr_item_sk, > > >> >> cr_order_number), > > >> >> >>> >catalog_sales BY (cs_item_sk, cs_order_number);* > > >> >> >>> >*Good_OrderRes = JOIN Good_OrderIn BY cs_item_sk, inventory > BY > > >> >> >>> > inv_item_sk;* > > >> >> >>> >*Dump or Store Good_OrderRes;* > > >> >> >>> > > > >> >> >>> >Since Pig execute the query lazily, I think only by Dump or > > Store > > >> the > > >> >> >>> >result, I can know the time of MapReduce Job, is it right? If > it > > >> is, > > >> >> >>> then I > > >> >> >>> >need to count the time to Dump or Store the result as the time > > for > > >> >> the > > >> >> >>> >different orders' join. > > >> >> >>> > > > >> >> >>> >Bests, > > >> >> >>> >Mingda > > >> >> >>> > > > >> >> >>> > > > >> >> >>> > > > >> >> >>> >On Tue, Nov 1, 2016 at 10:39 AM, Daniel Dai < > > >> da...@hortonworks.com> > > >> >> >>> wrote: > > >> >> >>> > > > >> >> >>> >> Hi, Mingda, > > >> >> >>> >> > > >> >> >>> >> Pig does not do join reordering and will execute the query > as > > >> the > > >> >> way > > >> >> >>> it > > >> >> >>> >> is written. Note you can join multiple relations in one join > > >> >> statement. > > >> >> >>> >> > > >> >> >>> >> Do you want execution time for each join in your statement? > I > > >> >> assume > > >> >> >>> you > > >> >> >>> >> are using regular join and running with MapReduce, every > join > > >> >> statement > > >> >> >>> >> will be a separate MapReduce job and the join runtime is the > > >> >> runtime > > >> >> >>> for > > >> >> >>> >> its MapReduce job. > > >> >> >>> >> > > >> >> >>> >> Thanks, > > >> >> >>> >> Daniel > > >> >> >>> >> > > >> >> >>> >> > > >> >> >>> >> > > >> >> >>> >> On 10/31/16, 8:21 PM, "mingda li" <limingda1...@gmail.com> > > >> wrote: > > >> >> >>> >> > > >> >> >>> >> >Dear all, > > >> >> >>> >> > > > >> >> >>> >> >I am doing optimization for multiple join. I am not sure if > > Pig > > >> >> can > > >> >> >>> decide > > >> >> >>> >> >the join order in optimization layer. Does anyone know > about > > >> >> this? Or > > >> >> >>> Pig > > >> >> >>> >> >just execute the query as the way it is written. > > >> >> >>> >> > > > >> >> >>> >> >And, I want to do the multiple way Join on different keys. > > Can > > >> the > > >> >> >>> >> >following query work? > > >> >> >>> >> > > > >> >> >>> >> >Res = > > >> >> >>> >> >JOIN > > >> >> >>> >> >(JOIN catalog_sales BY cs_item_sk, inventory BY > inv_item_sk) > > >> BY > > >> >> >>> >> >(cs_item_sk, cs_order_number), catalog_returns BY > > (cr_item_sk, > > >> >> >>> >> >cr_order_number); > > >> >> >>> >> > > > >> >> >>> >> >BTW, each time, I run the query, it is finished in one > > second. > > >> Is > > >> >> >>> there a > > >> >> >>> >> >way to see the execution time? I have set the > > >> >> pig.udf.profile=true. > > >> >> >>> Where > > >> >> >>> >> >can I find the time? > > >> >> >>> >> > > > >> >> >>> >> >Bests, > > >> >> >>> >> >Mingda > > >> >> >>> >> > > >> >> >>> > > >> >> >> > > >> >> >> > > >> >> > > >> > > > >> > > > >> > > > > > > > > >