Hi, I am seeing wrong computation of storage memory available which is leading to executor failures. I have allocated 8g memory with params: spark.memory.fraction=0.7 spark.memory.storageFraction=0.4 As expected, I was able to see 5.2 GB storage memory in UI. However, as per memory store logs I am seeing that free memory is increasing as the rdds are getting cached; which should have been ideally decreasing from 5.2 to 0.
Eventually, the executor runs OOM whereas the Memory store is reporting it has 5G available. Attached executor logs - 2016-10-28 12:36:51,996 INFO [Executor task launch worker-10] executor.Executor (Logging.scala:logInfo(58)) - Running task 324.0 in stage 1.0 (TID 760) 2016-10-28 12:36:52,019 INFO [Executor task launch worker-10] spark.CacheManager (Logging.scala:logInfo(58)) - Partition rdd_10_324 not found, computing it 2016-10-28 12:36:52,031 INFO [Executor task launch worker-8] storage.MemoryStore (Logging.scala:logInfo(58)) - *Block rdd_10_145 stored as values in memory (estimated size 65.0 MB, free 260.7 MB)* 2016-10-28 12:36:52,062 INFO [Executor task launch worker-10] storage.ShuffleBlockFetcherIterator (Logging.scala:logInfo(58)) - Getting 426 non-empty blocks out of 426 blocks 2016-10-28 12:36:52,109 INFO [Executor task launch worker-10] storage.ShuffleBlockFetcherIterator (Logging.scala:logInfo(58)) - Started 37 remote fetches in 47 ms 2016-10-28 12:36:52,130 INFO [Executor task launch worker-8] executor.Executor (Logging.scala:logInfo(58)) - Finished task 145.0 in stage 1.0 (TID 581). 268313 bytes result sent to driver 2016-10-28 12:36:52,150 INFO [dispatcher-event-loop-7] executor.CoarseGrainedExecutorBackend (Logging.scala:logInfo(58)) - Got assigned task 761 2016-10-28 12:36:52,150 INFO [Executor task launch worker-8] executor.Executor (Logging.scala:logInfo(58)) - Running task 325.0 in stage 1.0 (TID 761) 2016-10-28 12:36:52,164 INFO [Executor task launch worker-8] spark.CacheManager (Logging.scala:logInfo(58)) - Partition rdd_10_325 not found, computing it 2016-10-28 12:36:52,247 INFO [Executor task launch worker-8] storage.ShuffleBlockFetcherIterator (Logging.scala:logInfo(58)) - Getting 426 non-empty blocks out of 426 blocks 2016-10-28 12:36:52,264 INFO [Executor task launch worker-8] storage.ShuffleBlockFetcherIterator (Logging.scala:logInfo(58)) - Started 37 remote fetches in 18 ms 2016-10-28 12:36:52,591 INFO [Executor task launch worker-6] storage.MemoryStore (Logging.scala:logInfo(58)) -* Block rdd_10_45 stored as values in memory (estimated size 65.0 MB, free 325.7 MB)* 2016-10-28 12:36:52,646 INFO [Executor task launch worker-6] executor.Executor (Logging.scala:logInfo(58)) - Finished task 45.0 in stage 1.0 (TID 481). 266368 bytes result sent to driver Eventual Failures logs- 2016-10-28 12:53:06,718 WARN [Executor task launch worker-13] storage.MemoryStore (Logging.scala:logWarning(70)) - Not enough space to cache rdd_10_656 in memory! (computed 45.2 MB so far) 2016-10-28 12:53:06,718 INFO [Executor task launch worker-13] storage.MemoryStore (Logging.scala:logInfo(58)) - Memory use = 5.0 GB (blocks) + 211.4 MB (scratch space shared across 103 tasks(s)) = 5.2 GB. Storage limit = 5.2 GB. 2016-10-28 12:53:06,718 INFO [Executor task launch worker-13] storage.BlockManager (Logging.scala:logInfo(58)) - Found block rdd_10_656 locally 2016-10-28 12:53:06,719 INFO [Executor task launch worker-12] storage.MemoryStore (Logging.scala:logInfo(58)) - 1 blocks selected for dropping 2016-10-28 12:53:06,720 INFO [Executor task launch worker-12] storage.BlockManager (Logging.scala:logInfo(58)) - Dropping block rdd_10_719 from memory 2016-10-28 12:53:06,720 INFO [Executor task launch worker-12] storage.BlockManager (Logging.scala:logInfo(58)) - Writing block rdd_10_719 to disk 2016-10-28 12:53:06,736 ERROR [Executor task launch worker-15] executor.Executor (Logging.scala:logError(95)) - Exception in task 657.0 in stage 4.0 (TID 4565) java.lang.OutOfMemoryError: Unable to acquire 262144 bytes of memory, got 85027 at org.apache.spark.memory.MemoryConsumer.allocateArray(MemoryConsumer.java:91) Is this a bug or I am setting something wrong? Regards, Sushrut Ikhar [image: https://]about.me/sushrutikhar <https://about.me/sushrutikhar?promo=email_sig>