Dear all,
I am doing optimization for multiple join. I am not sure if Pig can decide
the join order in optimization layer. Does anyone know about this? Or Pig
just execute the query as the way it is written.
And, I want to do the multiple way Join on different keys. Can the
following query work?
and running with MapReduce, every join statement
> will be a separate MapReduce job and the join runtime is the runtime for
> its MapReduce job.
>
> Thanks,
> Daniel
>
>
>
> On 10/31/16, 8:21 PM, "mingda li" wrote:
>
> >Dear all,
> >
> >I am doi
;
> On 11/1/16, 10:52 AM, "mingda li" wrote:
>
> >Dear Dai,
> >
> >Thanks for your reply.
> >What I want to do is to compare the two different order of join. The query
> >is as following:
> >
> >*Bad_OrderIn = JOIN inventory BY in
Dear all,
Hi, now I wants to import a UDF function to pig command. Has anyone ever
done so? I want to import google's guava/murmur3_32 to pig. Could anyone
give some useful materials or suggestion?
Bests,
Mingda
On Wed, Nov 2, 2016 at 2:11 AM, mingda li wrote:
> Yeah, I see. Thanks
se/DATAFU-47
>
>
>
>
> On 11/2/16, 11:45 AM, "mingda li" wrote:
>
> >Dear all,
> >
> >Hi, now I wants to import a UDF function to pig command. Has anyone ever
> >done so? I want to import google's guava/murmur3_32 to pig. Could anyone
> &g
:
> It says that pig could not find the class Hasher. Start grunt with
> -Dpig.additional.jars (before other pig arguments) or do a "register" of
> individual jars before typing in your scripts.
>
> Regards,
> Debabrata
>
> On Nov 3, 2016 07:09, "mingda li
1.txt' using
PigStorage() as (val:int);
define MurmurH32 datafu.pig.hash.Hasher('murmur3-32');
dat= FOREACH data GENERATE MurmurH32(val);
On Wed, Nov 2, 2016 at 9:35 PM, mingda li wrote:
> En, thanks Debabrata, but actually, I register each time ( forget to tell
> you)
hare the error inside the log file
> mentioned in the output ?
>
> On Nov 3, 2016 10:12, "mingda li" wrote:
>
> > My query is as following:
> >
> > pig
> > -Dpig.additional.jars=/home/hadoop-user/pig-branch-0.lib/
> > datafu-pig-incub
Anyone have idea about the problem? I still cannot solve it.
On Wed, Nov 2, 2016 at 11:33 PM, mingda li wrote:
> Yeah, the log file's content is as following:
>
> 1 Pig Stack Trace
>
> 2 ---
>
> 3 ERROR 1070: Could not resolve datafu.pig
Dear all,
I want to test the different multiple join orders' efficiency. However,
since the pig query is executed lazily, I need to use dump or store to let
the query be executed.
Now, I use the following query to test the efficiency.
*Bad_OrderIn = JOIN inventory BY inv_item_sk, catalog_sales
ng,Liyun
>
>
>
> -----Original Message-
> From: mingda li [mailto:limingda1...@gmail.com]
> Sent: Wednesday, December 7, 2016 8:18 AM
> To: d...@pig.apache.org; user@pig.apache.org
> Subject: How to test the efficiency of multiple join
>
> Dear all,
>
> I want to tes
Hi,
I am running a multiple join of 100G TPC-DS data with bad order on our
cluster. And each time, it returns such log file to me with the exception:
Has anyone ever met it? Is it caused by too much data more than disk space?
* org.apache.hadoop.ipc.RemoteException: java.io.IOException: File
/tmp
uld make processing of join stop after 4 records. It is not a
> good idea to add it if you are testing performance of join.
>
> On Tue, Dec 6, 2016 at 8:13 PM mingda li wrote:
>
> > Thanks for your quick reply. If so, I can use the limit operator to
> compare
> >
> &
Dear all,
I am testing the efficiency of multiple join in pig. To let join be
executed, I use the count star executer. And since the count in pig need
group operation firstly, I optimize the operation by converting the
following query:
Bad_OrderIn = JOIN inventory BY inv_item_sk, catalog_sales B
14 matches
Mail list logo