Re: 回复： trouble understanding data frame memory usage ³java.io.IOException: Unable to acquire memory²

Chris Fregly Wed, 30 Dec 2015 10:05:10 -0800

@Jim-

I'm wondering if those docs are outdated as its my understanding (please 
correct if I'm wrong), that we should never be seeing OOMs as 1.5/Tungsten not 
only improved (reduced) the memory footprint of our data, but also introduced 
better task level - and even key level - external spilling before an OOM occurs.


Michael's comment seems to indicate there was a missed case that is fixed in 
1.6.

I personally see far too many OOMs for what I expected out of 1.5, so I'm 
anxious to try 1.6 and hopefully squash more of these edge cases.

while increasing parallelism is definitely a best practice and applies either 
way, the docs could use some updating, I feel, by the contributors of this code.

> On Dec 30, 2015, at 12:18 PM, SparkUser <[email protected]> wrote:
> 
> Sounds like you guys are on the right track, this is purely FYI because I 
> haven't seen it posted, just responding to the line in the original post that 
> your data structure should fit in memory.
> 
> OK two more disclaimers "FWIW" and "maybe this is not relevant or already 
> covered" OK here goes...
> 
>  from 
> http://spark.apache.org/docs/latest/tuning.html#memory-usage-of-reduce-tasks
> 
> Sometimes, you will get an OutOfMemoryError not because your RDDs don’t fit 
> in memory, but because the working set of one of your tasks, such as one of 
> the reduce tasks in groupByKey, was too large. Spark’s shuffle operations 
> (sortByKey, groupByKey, reduceByKey, join, etc) build a hash table within 
> each task to perform the grouping, which can often be large. The simplest fix 
> here is to increase the level of parallelism, so that each task’s input set 
> is smaller. Spark can efficiently support tasks as short as 200 ms, because 
> it reuses one executor JVM across many tasks and it has a low task launching 
> cost, so you can safely increase the level of parallelism to more than the 
> number of cores in your clusters.
> 
> I would be curious if that helps at all. Sounds like an interesting problem 
> you are working on.
> 
> Jim
> 
>> On 12/29/2015 05:51 PM, Davies Liu wrote:
>> Hi Andy,  
>> 
>> Could you change logging level to INFO and post some here? There will be 
>> some logging about the memory usage of a task when OOM.  
>> 
>> In 1.6, the memory for a task is : (HeapSize  - 300M) * 0.75 / number of 
>> tasks. Is it possible that the heap is too small?
>> 
>> Davies  
>> 
>> --  
>> Davies Liu
>> Sent with Sparrow (http://www.sparrowmailapp.com/?sig)
>> 
>> 已使用 Sparrow (http://www.sparrowmailapp.com/?sig)  
>> 
>> 在 2015年12月29日 星期二，下午4:28，Andy Davidson 写道：
>> 
>>> Hi Michael
>>>  
>>> https://github.com/apache/spark/archive/v1.6.0.tar.gz
>>>  
>>> Both 1.6.0 and 1.5.2 my unit test work when I call reparation(1) before 
>>> saving output. Coalesce still fails.  
>>>  
>>> Coalesce(1) spark-1.5.2
>>> Caused by:
>>> java.io.IOException: Unable to acquire 33554432 bytes of memory
>>>  
>>>  
>>> Coalesce(1) spark-1.6.0
>>>  
>>> Caused by:  
>>> java.lang.OutOfMemoryError: Unable to acquire 28 bytes of memory, got 0
>>>  
>>> Hope this helps
>>>  
>>> Andy
>>>  
>>> From: Michael Armbrust <[email protected] 
>>> (mailto:[email protected])>
>>> Date: Monday, December 28, 2015 at 2:41 PM
>>> To: Andrew Davidson <[email protected] 
>>> (mailto:[email protected])>
>>> Cc: "user @spark" <[email protected] (mailto:[email protected])>
>>> Subject: Re: trouble understanding data frame memory usage 
>>> ³java.io.IOException: Unable to acquire memory²
>>>  
>>>> Unfortunately in 1.5 we didn't force operators to spill when ran out of 
>>>> memory so there is not a lot you can do. It would be awesome if you could 
>>>> test with 1.6 and see if things are any better?
>>>>  
>>>> On Mon, Dec 28, 2015 at 2:25 PM, Andy Davidson 
>>>> <[email protected] (mailto:[email protected])> 
>>>> wrote:
>>>>> I am using spark 1.5.1. I am running into some memory problems with a 
>>>>> java unit test. Yes I could fix it by setting –Xmx (its set to 1024M) how 
>>>>> ever I want to better understand what is going on so I can write better 
>>>>> code in the future. The test runs on a Mac, master="Local[2]"
>>>>>  
>>>>> I have a java unit test that starts by reading a 672K ascii file. I my 
>>>>> output data file is 152K. Its seems strange that such a small amount of 
>>>>> data would cause an out of memory exception. I am running a pretty 
>>>>> standard machine learning process
>>>>>  
>>>>> Load data
>>>>> create a ML pipeline
>>>>> transform the data
>>>>> Train a model
>>>>> Make predictions
>>>>> Join the predictions back to my original data set
>>>>> Coalesce(1), I only have a small amount of data and want to save it in a 
>>>>> single file
>>>>> Save final results back to disk
>>>>>  
>>>>>  
>>>>> Step 7: I am unable to call Coalesce() “java.io.IOException: Unable to 
>>>>> acquire memory”
>>>>>  
>>>>> To try and figure out what is going I put log messages in to count the 
>>>>> number of partitions
>>>>>  
>>>>> Turns out I have 20 input files, each one winds up in a separate 
>>>>> partition. Okay so after loading I call coalesce(1) and check to make 
>>>>> sure I only have a single partition.
>>>>>  
>>>>> The total number of observations is 1998.
>>>>>  
>>>>> After calling step 7 I count the number of partitions and discovered I 
>>>>> have 224 partitions!. Surprising given I called Coalesce(1) before I did 
>>>>> anything with the data. My data set should easily fit in memory. When I 
>>>>> save them to disk I get 202 files created with 162 of them being empty!
>>>>>  
>>>>> In general I am not explicitly using cache.
>>>>>  
>>>>> Some of the data frames get registered as tables. I find it easier to use 
>>>>> sql.
>>>>>  
>>>>> Some of the data frames get converted back to RDDs. I find it easier to 
>>>>> create RDD<LabeledPoint> this way
>>>>>  
>>>>> I put calls to unpersist(true). In several places
>>>>>  
>>>>>  
>>>>> private void memoryCheck(String name) {
>>>>>  
>>>>>  
>>>>> Runtime rt = Runtime.getRuntime();
>>>>>  
>>>>>  
>>>>> logger.warn("name: {} \t\ttotalMemory: {} \tfreeMemory: {} df.size: {}",  
>>>>>  
>>>>>  
>>>>> name,  
>>>>>  
>>>>>  
>>>>> String.format("%,d", rt.totalMemory()),  
>>>>>  
>>>>>  
>>>>> String.format("%,d", rt.freeMemory()));
>>>>>  
>>>>>  
>>>>> }
>>>>>  
>>>>>  
>>>>>  
>>>>> Any idea how I can get a better understanding of what is going on? My 
>>>>> goal is to learn to write better spark code.
>>>>>  
>>>>> Kind regards
>>>>>  
>>>>> Andy
>>>>>  
>>>>> Memory usages at various points in my unit test
>>>>>  
>>>>> name: rawInput totalMemory: 447,741,952 freeMemory: 233,203,184  
>>>>>  
>>>>>  
>>>>> name: naiveBayesModel totalMemory: 509,083,648 freeMemory: 403,504,128  
>>>>>  
>>>>>  
>>>>> name: lpRDD totalMemory: 509,083,648 freeMemory: 402,288,104  
>>>>>  
>>>>>  
>>>>> name: results totalMemory: 509,083,648 freeMemory: 368,011,008  
>>>>>  
>>>>>  
>>>>>  
>>>>>  
>>>>>  
>>>>>  
>>>>> DataFrame exploreDF = results.select(results.col("id"),  
>>>>>  
>>>>>  
>>>>> results.col("label"),  
>>>>>  
>>>>>  
>>>>> results.col("binomialLabel"),  
>>>>>  
>>>>>  
>>>>> results.col("labelIndex"),  
>>>>>  
>>>>>  
>>>>> results.col("prediction"),  
>>>>>  
>>>>>  
>>>>> results.col("words"));
>>>>>  
>>>>>  
>>>>> exploreDF.show(10);
>>>>>  
>>>>>  
>>>>>  
>>>>>  
>>>>> Yes I realize its strange to switch styles how ever this should not cause 
>>>>> memory problems
>>>>>  
>>>>>  
>>>>>  
>>>>>  
>>>>>  
>>>>>  
>>>>> final String exploreTable = "exploreTable";
>>>>>  
>>>>>  
>>>>> exploreDF.registerTempTable(exploreTable);
>>>>>  
>>>>>  
>>>>> String fmt = "SELECT * FROM %s where binomialLabel = ’signal'";
>>>>>  
>>>>>  
>>>>> String stmt = String.format(fmt, exploreTable);
>>>>>  
>>>>>  
>>>>>  
>>>>>  
>>>>>  
>>>>>  
>>>>>  
>>>>>  
>>>>>  
>>>>> DataFrame subsetToSave = sqlContext.sql(stmt);// .show(100);
>>>>>  
>>>>>  
>>>>>  
>>>>>  
>>>>>  
>>>>>  
>>>>> name: subsetToSave totalMemory: 1,747,451,904 freeMemory: 1,049,447,144  
>>>>>  
>>>>>  
>>>>>  
>>>>>  
>>>>>  
>>>>>  
>>>>> exploreDF.unpersist(true); does not resolve memory issue  
>> 
>> 
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: [email protected]
>> For additional commands, e-mail: [email protected]
>

Re: 回复： trouble understanding data frame memory usage ³java.io.IOException: Unable to acquire memory²

Reply via email to