Re: Pig Interpreter

Nihal Bhagchandani Thu, 01 Oct 2015 05:15:48 -0700

Hi,so as per my understanding:
PIG: Uses a scripting language called Pig Latin, which is more workflow driven. 
Is an abstraction layer on top of map-reduce. Pig use batch oriented 
frameworks, which means your analytic jobs will run for minutes or may be hours 
depending upon the volume of data. think PIG as step by step SQL execution.
Spark SQL : Allows us to do SQL like actions in HDFS or file-system with 100x 
faster performance than Map reduce when SQL performed in memory.Else on Disk 
its ten time faster.


 Pig, a SQL-like language that gracefully tolerates inconsistent schemas, and 
that runs on Hadoop.

The basic concepts in SQL map pretty well onto Pig. There are analogues for the 
major SQL keywords, and as a result you can write a query in your head as SQL 
and then translate it into Pig Latin without undue mental gymnastics.
WHERE → FILTER
The syntax is different, but conceptually this is still putting your data into 
a funnel to create a smaller dataset.
HAVING → FILTER
Because a FILTER is done in a separate step from a GROUP or an aggregation, the 
distinction between HAVING and WHERE doesn’t exist in Pig.
ORDER BY → ORDER
This keyword behaves pretty much the same in Pig as in SQL.
JOIN
In Pig, joins can have their execution specified, and they look a little 
different, but in essence these are the same joins you know from SQL, and you 
can think about them in the same way. There are INNER and OUTER joins, RIGHT 
and LEFT specifications, and even CROSS for those rare moments that you 
actually want a Cartesian product.Because Pig is most appropriately used for 
data pipelines, there are often fewer distinct relations or tables than you 
would expect to see in a traditional normalized relational database.
Control over Execution
SQL performance tuning generally involves some fiddling with indexes, 
punctuated by the occasional yelling at an explain plan that has inexplicably 
decided to join the two largest tables first. It can mean getting a different 
plan the second time you run a query, or having the plan suddenly change after 
several weeks of use because the statistics have evolved, throwing your query’s 
performance into the proverbial toilet.Various SQL implementations offer hints 
to combat this problem—you can use a hint to tell your SQL optimizer that it 
should use an index, or to force a given table to be first in the join order. 
Unfortunately, because hints are dependent on the particular SQL 
implementation, what you actually have at your disposal varies by platform.Pig 
offers a few different ways to control the execution plan. The first is just 
the explicit ordering of operations. You can write your FILTER before your JOIN 
(the reverse of SQL’s order) and be clever about eliminating unused fields 
along the way, and have confidence that the executed order will not be 
worse.Secondly, the philosophy of Pig is to allow users to choose 
implementations where multiple ones are possible. As a result, there are three 
specialized joins that a can be used when the features of the data are known, 
and are less appropriate for a regular join. For regular joins, the order of 
the arguments dictates execution—the larger data set should appear last in this 
type of join.As with SQL, in Pig you can pretty much ignore the performance 
tweaks until you can’t. Because of the explicit control of ordering, it can be 
useful to have a general sense of the “good” order to do things in, though 
Pig’s optimizer will also try to push up FILTERs and LIMITs, taking some of the 
pressure off.
here is dennylee's link where you can find SPARK vs PIG 
http://dennyglee.com/2013/08/19/why-all-this-interest-in-spark/
most of the task/processing which is possible thru PIG can be easily achieved 
by using SPARK, in much lesser easy to understandable code and since SPARK is 
in memory its 100x faster than any hadoop map-reduce tasks.
RegardsNihal 




 


     On Thursday, 1 October 2015 3:35 PM, moon soo Lee <m...@apache.org> wrote:
   

 I dont know Pig very well, but It's little bit difficult to think how 
spark-sql can help pig users. Can you explain more?

Thanks,
moon
On 2015년 10월 1일 (목) at 오전 11:39 Nihal Bhagchandani 
<nihal_bhagchand...@yahoo.com> wrote:

Is there is any extra advantage to have a PIG Interpreter when zeppelin already 
support SPARK-SQL?
Nihal

Sent from my iPhone
On 01-Oct-2015, at 12:54, moon soo Lee <m...@apache.org> wrote:


Hi,

As far as i know, there're no ongoing work for a pig interpreter. But no reason 
to not having one. How about file an issue for it?

Thanks,
moon
On 2015년 9월 23일 (수) at 오후 11:23 Michael Parco <33pa...@cardinalmail.cua.edu> 
wrote:

Is there any current work or plans for a Pig interpreter in Zeppelin?

Re: Pig Interpreter

Reply via email to