About Multiple Join in Pig

2016-10-31 Thread mingda li
Dear all, I am doing optimization for multiple join. I am not sure if Pig can decide the join order in optimization layer. Does anyone know about this? Or Pig just execute the query as the way it is written. And, I want to do the multiple way Join on different keys. Can the following query work?

Re: About Multiple Join in Pig

2016-11-01 Thread mingda li
and running with MapReduce, every join statement > will be a separate MapReduce job and the join runtime is the runtime for > its MapReduce job. > > Thanks, > Daniel > > > > On 10/31/16, 8:21 PM, "mingda li" wrote: > > >Dear all, > > > >I am doi

Re: About Multiple Join in Pig

2016-11-02 Thread mingda li
; > On 11/1/16, 10:52 AM, "mingda li" wrote: > > >Dear Dai, > > > >Thanks for your reply. > >What I want to do is to compare the two different order of join. The query > >is as following: > > > >*Bad_OrderIn = JOIN inventory BY in

Re: About Multiple Join in Pig

2016-11-02 Thread mingda li
Dear all, Hi, now I wants to import a UDF function to pig command. Has anyone ever done so? I want to import google's guava/murmur3_32 to pig. Could anyone give some useful materials or suggestion? Bests, Mingda On Wed, Nov 2, 2016 at 2:11 AM, mingda li wrote: > Yeah, I see. Thanks

Re: About Multiple Join in Pig

2016-11-02 Thread mingda li
se/DATAFU-47 > > > > > On 11/2/16, 11:45 AM, "mingda li" wrote: > > >Dear all, > > > >Hi, now I wants to import a UDF function to pig command. Has anyone ever > >done so? I want to import google's guava/murmur3_32 to pig. Could anyone > &g

Re: About Multiple Join in Pig

2016-11-02 Thread mingda li
: > It says that pig could not find the class Hasher. Start grunt with > -Dpig.additional.jars (before other pig arguments) or do a "register" of > individual jars before typing in your scripts. > > Regards, > Debabrata > > On Nov 3, 2016 07:09, "mingda li

Re: About Multiple Join in Pig

2016-11-02 Thread mingda li
1.txt' using PigStorage() as (val:int); define MurmurH32 datafu.pig.hash.Hasher('murmur3-32'); dat= FOREACH data GENERATE MurmurH32(val); On Wed, Nov 2, 2016 at 9:35 PM, mingda li wrote: > En, thanks Debabrata, but actually, I register each time ( forget to tell > you)

Re: About Multiple Join in Pig

2016-11-02 Thread mingda li
hare the error inside the log file > mentioned in the output ? > > On Nov 3, 2016 10:12, "mingda li" wrote: > > > My query is as following: > > > > pig > > -Dpig.additional.jars=/home/hadoop-user/pig-branch-0.lib/ > > datafu-pig-incub

Re: About Multiple Join in Pig

2016-11-03 Thread mingda li
Anyone have idea about the problem? I still cannot solve it. On Wed, Nov 2, 2016 at 11:33 PM, mingda li wrote: > Yeah, the log file's content is as following: > > 1 Pig Stack Trace > > 2 --- > > 3 ERROR 1070: Could not resolve datafu.pig

How to test the efficiency of multiple join

2016-12-06 Thread mingda li
Dear all, I want to test the different multiple join orders' efficiency. However, since the pig query is executed lazily, I need to use dump or store to let the query be executed. Now, I use the following query to test the efficiency. *Bad_OrderIn = JOIN inventory BY inv_item_sk, catalog_sales

Re: How to test the efficiency of multiple join

2016-12-06 Thread mingda li
ng,Liyun > > > > -----Original Message- > From: mingda li [mailto:limingda1...@gmail.com] > Sent: Wednesday, December 7, 2016 8:18 AM > To: d...@pig.apache.org; user@pig.apache.org > Subject: How to test the efficiency of multiple join > > Dear all, > > I want to tes

File could only be replicated to 0 nodes, instead of 1

2016-12-06 Thread mingda li
Hi, I am running a multiple join of 100G TPC-DS data with bad order on our cluster. And each time, it returns such log file to me with the exception: Has anyone ever met it? Is it caused by too much data more than disk space? * org.apache.hadoop.ipc.RemoteException: java.io.IOException: File /tmp

Re: How to test the efficiency of multiple join

2016-12-07 Thread mingda li
uld make processing of join stop after 4 records. It is not a > good idea to add it if you are testing performance of join. > > On Tue, Dec 6, 2016 at 8:13 PM mingda li wrote: > > > Thanks for your quick reply. If so, I can use the limit operator to > compare > > > &

Best way to test join efficiency

2016-12-12 Thread mingda li
Dear all, I am testing the efficiency of multiple join in pig. To let join be executed, I use the count star executer. And since the count in pig need group operation firstly, I optimize the operation by converting the following query: Bad_OrderIn = JOIN inventory BY inv_item_sk, catalog_sales B