Pig on Spark

Praveen R Tue, 15 Jul 2014 07:37:29 -0700

Hi Everyone,

We, at SigmoidAnalytics have been working on pig on spark for sometime and
would like to hear your thoughts about it.


You can find the repo at here: https://github.com/sigmoidanalytics/spork and
the README has been updated to work with Spark 0.9. We have currently
tested it on hadoop-1.0.4 and hadoop-2.2.0.

Below are some major issues we are having:
1. Send objects from driver to executors, we have built at tcp server to
broadcast
<https://github.com/sigmoidanalytics/spork/commit/b35b57d94c9b0b4dfdf165b30ba8145f65975f23>
data to executors to achieve this.
2. Large shuffle data when performing groupBy.

Please feel free to file issues on the github repo or mail us at:
[email protected].

Thanks,
Praveen R

Pig on Spark

Reply via email to