Re: DataFrame from in memory datasets in multiple JVMs

2017-02-28 Thread John Desuvio
Since the data is in multiple JVMs, only 1 of them can be the driver. So I can parallelize the data from 1 of the VMs but don't have a way to do the same for the others. Or am I missing something? On Tue, Feb 28, 2017 at 3:53 PM, ayan guha wrote: > How about parallelize and then union all of

Re: DataFrame from in memory datasets in multiple JVMs

2017-02-28 Thread ayan guha
How about parallelize and then union all of them to one data frame? On Wed, 1 Mar 2017 at 3:07 am, Sean Owen wrote: > Broadcasts let you send one copy of read only data to each executor. > That's not the same as a DataFrame and itseems nature means it doesnt make > sense to think of them as not

Re: DataFrame from in memory datasets in multiple JVMs

2017-02-28 Thread Sean Owen
Broadcasts let you send one copy of read only data to each executor. That's not the same as a DataFrame and itseems nature means it doesnt make sense to think of them as not distributed. But consider things like broadcast hash joins which may be what you are looking for if you really mean to join o