Hi Praveen,

Thank you for sharing your work!

As far as I know, there are quite a few people who are interested in Pig on
Spark. I am wondering whether we can collaborate together to avoid
duplicate efforts as a community.

Do you think we can create a umbrella jira for Pig on Spark and continue
the discussion there? Once we agree on the design, Pig committers are
willing to help create a feature branch and commit patches. Please let me
know what you think.

Thanks,
Cheolsoo


On Tue, Jul 15, 2014 at 7:36 AM, Praveen R <[email protected]>
wrote:

> Hi Everyone,
>
> We, at SigmoidAnalytics have been working on pig on spark for sometime and
> would like to hear your thoughts about it.
>
> You can find the repo at here: https://github.com/sigmoidanalytics/spork
> and
> the README has been updated to work with Spark 0.9. We have currently
> tested it on hadoop-1.0.4 and hadoop-2.2.0.
>
> Below are some major issues we are having:
> 1. Send objects from driver to executors, we have built at tcp server to
> broadcast
> <
> https://github.com/sigmoidanalytics/spork/commit/b35b57d94c9b0b4dfdf165b30ba8145f65975f23
> >
> data to executors to achieve this.
> 2. Large shuffle data when performing groupBy.
>
> Please feel free to file issues on the github repo or mail us at:
> [email protected].
>
> Thanks,
> Praveen R
>

Reply via email to