subject:"Best strategy for Pandas \-> Spark"

Re: Best strategy for Pandas -> Spark

2015-06-02 Thread Olivier Girardot

Thanks for the answer, I'm currently doing exactly that. I'll try to sum-up the usual Pandas <=> Spark Dataframe caveats soon. Regards, Olivier. Le mar. 2 juin 2015 à 02:38, Davies Liu a écrit : > The second one sounds reasonable, I think. > > On Thu, Apr 30, 2015 at 1:42 AM, Olivier Girardot

Re: Best strategy for Pandas -> Spark

2015-06-01 Thread Davies Liu

The second one sounds reasonable, I think. On Thu, Apr 30, 2015 at 1:42 AM, Olivier Girardot wrote: > Hi everyone, > Let's assume I have a complex workflow of more than 10 datasources as input > - 20 computations (some creating intermediary datasets and some merging > everything for the final com

Best strategy for Pandas -> Spark

2015-04-30 Thread Olivier Girardot

Hi everyone, Let's assume I have a complex workflow of more than 10 datasources as input - 20 computations (some creating intermediary datasets and some merging everything for the final computation) - some taking on average 1 minute to complete and some taking more than 30 minutes. What would be f