Would there be a way to force the 'old' data out? Because at this point
I'll have to restart the shell every couple of queries to get meaningful
timings which are comparable to spark submit .
On Jun 29, 2015 6:20 PM, "Mark Hamstra" wrote:
> No. He is collecting the results of the SQL query, not
No. He is collecting the results of the SQL query, not the whole dataset.
The REPL does retain references to prior results, so it's not really the
best tool to be using when you want no-longer-needed results to be
automatically garbage collected.
On Mon, Jun 29, 2015 at 9:13 AM, ayan guha wrote:
When you call collect, you are bringing whole dataset back to driver
memory.
On 30 Jun 2015 01:43, "hbogert" wrote:
> I'm running a query from the BigDataBenchmark, query 1B to be precise.
>
> When running this with Spark (1.3.1)+ mesos(0.21) in coarse grained mode
> with 5 mesos slave, through a