Praveen,
    I created the jira https://issues.apache.org/jira/browse/PIG-4059 for
Pig on Spark. Similar to https://issues.apache.org/jira/browse/PIG-3446
(Pig on Tez), https://issues.apache.org/jira/browse/PIG-3453 (Pig on
Storm), you can upload your design doc to that jira or write up a proposal
in cwiki and open it up for discussion there. From what I hear there are
more companies interested and it would be better to collaborate early and
agree upon the design and have all parties in agreement before spending
more time on the implementation.

Regards,
Rohini



On Thu, Jul 17, 2014 at 3:53 AM, Praveen R <[email protected]>
wrote:

> Hi Cheolsoo,
>
> Thanks for your reply.
>
> Currently we felt github issues would work well with the developers and
> once we see more number of issues coming we shall start a jira and file
> issues there.
>
> Also, we are looking at sending a proposal to pig dev group soon to hear
> comments on the project.
>
>
> On Wed, Jul 16, 2014 at 5:04 AM, Cheolsoo Park <[email protected]>
> wrote:
>
> > Hi Praveen,
> >
> > Thank you for sharing your work!
> >
> > As far as I know, there are quite a few people who are interested in Pig
> on
> > Spark. I am wondering whether we can collaborate together to avoid
> > duplicate efforts as a community.
> >
> > Do you think we can create a umbrella jira for Pig on Spark and continue
> > the discussion there? Once we agree on the design, Pig committers are
> > willing to help create a feature branch and commit patches. Please let me
> > know what you think.
> >
> > Thanks,
> > Cheolsoo
> >
> >
> > On Tue, Jul 15, 2014 at 7:36 AM, Praveen R <[email protected]
> >
> > wrote:
> >
> > > Hi Everyone,
> > >
> > > We, at SigmoidAnalytics have been working on pig on spark for sometime
> > and
> > > would like to hear your thoughts about it.
> > >
> > > You can find the repo at here:
> https://github.com/sigmoidanalytics/spork
> > > and
> > > the README has been updated to work with Spark 0.9. We have currently
> > > tested it on hadoop-1.0.4 and hadoop-2.2.0.
> > >
> > > Below are some major issues we are having:
> > > 1. Send objects from driver to executors, we have built at tcp server
> to
> > > broadcast
> > > <
> > >
> >
> https://github.com/sigmoidanalytics/spork/commit/b35b57d94c9b0b4dfdf165b30ba8145f65975f23
> > > >
> > > data to executors to achieve this.
> > > 2. Large shuffle data when performing groupBy.
> > >
> > > Please feel free to file issues on the github repo or mail us at:
> > > [email protected].
> > >
> > > Thanks,
> > > Praveen R
> > >
> >
>

Reply via email to