Hi Everyone, We, at SigmoidAnalytics have been working on pig on spark for sometime and would like to hear your thoughts about it.
You can find the repo at here: https://github.com/sigmoidanalytics/spork and the README has been updated to work with Spark 0.9. We have currently tested it on hadoop-1.0.4 and hadoop-2.2.0. Below are some major issues we are having: 1. Send objects from driver to executors, we have built at tcp server to broadcast <https://github.com/sigmoidanalytics/spork/commit/b35b57d94c9b0b4dfdf165b30ba8145f65975f23> data to executors to achieve this. 2. Large shuffle data when performing groupBy. Please feel free to file issues on the github repo or mail us at: [email protected]. Thanks, Praveen R
