Would there be a way to force the 'old' data out? Because at this point
I'll have to restart the shell every couple of queries to get meaningful
timings which are comparable to spark submit .
On Jun 29, 2015 6:20 PM, "Mark Hamstra" <m...@clearstorydata.com> wrote:

> No.  He is collecting the results of the SQL query, not the whole
> dataset.  The REPL does retain references to prior results, so it's not
> really the best tool to be using when you want no-longer-needed results to
> be automatically garbage collected.
>
> On Mon, Jun 29, 2015 at 9:13 AM, ayan guha <guha.a...@gmail.com> wrote:
>
>> When you call collect, you are bringing whole dataset back to driver
>> memory.
>> On 30 Jun 2015 01:43, "hbogert" <hansbog...@gmail.com> wrote:
>>
>>> I'm running a query from the BigDataBenchmark, query 1B to be precise.
>>>
>>> When running this with Spark (1.3.1)+ mesos(0.21) in coarse grained mode
>>> with 5 mesos slave, through a spark shell, all is well.
>>> However rerunning the query a few times:
>>>     scala> sqlContext.sql("SELECT pageURL, pageRank FROM rankings where
>>> pageRank > 100").collect
>>> Builds up loads of memory for the spark-shell process. Up to the point
>>> that
>>> 19GB (spark.driver.memory=30GB) is full and then the same collect of the
>>> above query goes from, approx 10s to 40+s with obvious stalls (garbage
>>> collection).
>>> I'm I doing something wrong? Why isn't spark releasing the results'
>>> memory,
>>> I'm not saving them anywhere using the .collect am I?
>>>
>>> I'm loading in the following file, and then execute its _loadRankings_
>>> method:
>>> http://pastebin.com/rzJmWDxJ
>>>
>>>
>>> Hope someone can clearify this.
>>>
>>>
>>> PS
>>> java 1.7.0 is used, if more environment info is needed, please let me
>>> know.
>>>
>>>
>>>
>>> --
>>> View this message in context:
>>> http://apache-spark-user-list.1001560.n3.nabble.com/Spark-shell-crumbles-after-memory-is-full-tp23533.html
>>> Sent from the Apache Spark User List mailing list archive at Nabble.com.
>>>
>>> ---------------------------------------------------------------------
>>> To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
>>> For additional commands, e-mail: user-h...@spark.apache.org
>>>
>>>
>

Reply via email to