Thanks for the answer, I'm currently doing exactly that.
I'll try to sum-up the usual Pandas <=> Spark Dataframe caveats soon.
Regards,
Olivier.
Le mar. 2 juin 2015 à 02:38, Davies Liu a écrit :
> The second one sounds reasonable, I think.
>
> On Thu, Apr 30, 2015 at 1:42 AM, Olivier Girardot
The second one sounds reasonable, I think.
On Thu, Apr 30, 2015 at 1:42 AM, Olivier Girardot
wrote:
> Hi everyone,
> Let's assume I have a complex workflow of more than 10 datasources as input
> - 20 computations (some creating intermediary datasets and some merging
> everything for the final com
Hi everyone,
Let's assume I have a complex workflow of more than 10 datasources as input
- 20 computations (some creating intermediary datasets and some merging
everything for the final computation) - some taking on average 1 minute to
complete and some taking more than 30 minutes.
What would be f