Hi,so as per my understanding: PIG: Uses a scripting language called Pig Latin, which is more workflow driven. Is an abstraction layer on top of map-reduce. Pig use batch oriented frameworks, which means your analytic jobs will run for minutes or may be hours depending upon the volume of data. think PIG as step by step SQL execution. Spark SQL : Allows us to do SQL like actions in HDFS or file-system with 100x faster performance than Map reduce when SQL performed in memory.Else on Disk its ten time faster.
Pig, a SQL-like language that gracefully tolerates inconsistent schemas, and that runs on Hadoop. The basic concepts in SQL map pretty well onto Pig. There are analogues for the major SQL keywords, and as a result you can write a query in your head as SQL and then translate it into Pig Latin without undue mental gymnastics. WHERE → FILTER The syntax is different, but conceptually this is still putting your data into a funnel to create a smaller dataset. HAVING → FILTER Because a FILTER is done in a separate step from a GROUP or an aggregation, the distinction between HAVING and WHERE doesn’t exist in Pig. ORDER BY → ORDER This keyword behaves pretty much the same in Pig as in SQL. JOIN In Pig, joins can have their execution specified, and they look a little different, but in essence these are the same joins you know from SQL, and you can think about them in the same way. There are INNER and OUTER joins, RIGHT and LEFT specifications, and even CROSS for those rare moments that you actually want a Cartesian product.Because Pig is most appropriately used for data pipelines, there are often fewer distinct relations or tables than you would expect to see in a traditional normalized relational database. Control over Execution SQL performance tuning generally involves some fiddling with indexes, punctuated by the occasional yelling at an explain plan that has inexplicably decided to join the two largest tables first. It can mean getting a different plan the second time you run a query, or having the plan suddenly change after several weeks of use because the statistics have evolved, throwing your query’s performance into the proverbial toilet.Various SQL implementations offer hints to combat this problem—you can use a hint to tell your SQL optimizer that it should use an index, or to force a given table to be first in the join order. Unfortunately, because hints are dependent on the particular SQL implementation, what you actually have at your disposal varies by platform.Pig offers a few different ways to control the execution plan. The first is just the explicit ordering of operations. You can write your FILTER before your JOIN (the reverse of SQL’s order) and be clever about eliminating unused fields along the way, and have confidence that the executed order will not be worse.Secondly, the philosophy of Pig is to allow users to choose implementations where multiple ones are possible. As a result, there are three specialized joins that a can be used when the features of the data are known, and are less appropriate for a regular join. For regular joins, the order of the arguments dictates execution—the larger data set should appear last in this type of join.As with SQL, in Pig you can pretty much ignore the performance tweaks until you can’t. Because of the explicit control of ordering, it can be useful to have a general sense of the “good” order to do things in, though Pig’s optimizer will also try to push up FILTERs and LIMITs, taking some of the pressure off. here is dennylee's link where you can find SPARK vs PIG http://dennyglee.com/2013/08/19/why-all-this-interest-in-spark/ most of the task/processing which is possible thru PIG can be easily achieved by using SPARK, in much lesser easy to understandable code and since SPARK is in memory its 100x faster than any hadoop map-reduce tasks. RegardsNihal On Thursday, 1 October 2015 3:35 PM, moon soo Lee <m...@apache.org> wrote: I dont know Pig very well, but It's little bit difficult to think how spark-sql can help pig users. Can you explain more? Thanks, moon On 2015년 10월 1일 (목) at 오전 11:39 Nihal Bhagchandani <nihal_bhagchand...@yahoo.com> wrote: Is there is any extra advantage to have a PIG Interpreter when zeppelin already support SPARK-SQL? Nihal Sent from my iPhone On 01-Oct-2015, at 12:54, moon soo Lee <m...@apache.org> wrote: Hi, As far as i know, there're no ongoing work for a pig interpreter. But no reason to not having one. How about file an issue for it? Thanks, moon On 2015년 9월 23일 (수) at 오후 11:23 Michael Parco <33pa...@cardinalmail.cua.edu> wrote: Is there any current work or plans for a Pig interpreter in Zeppelin?