Hi Tamas,

I feel terribly sorry that I forgot to mention we are currently running
Spark 1.6. Thanks for your reply though.

BR,
Todd Leo
Tamas Szuromi <tamas.szur...@odigeo.com>于2016年3月5日 周六下午4:33写道:

> Hey,
> We had the same with Spark 1.5.x and disappeared after we upgraded to 1.6.
>
> Tamas
>
>
> On Saturday, 5 March 2016, SLiZn Liu <sliznmail...@gmail.com> wrote:
>
>> Hi Spark Mailing List,
>>
>> I’m running terabytes of text files with Spark on Mesos, the job runs
>> fine until we decided to switch to Mesos fine-grained mode.
>>
>> At first glance, we spotted massive number of task lost errors in logs:
>>
>> 16/03/05 04:01:20 ERROR TaskSchedulerImpl: Ignoring update with state LOST 
>> for TID 14420 because its task set is gone (this is likely the result of 
>> receiving duplicate task finished status updates)
>> 16/03/05 04:01:20 WARN TaskSetManager: Lost task 122.0 in stage 10.0 (TID 
>> 13901, ourhost.com): java.io.FileNotFoundException: 
>> /home/mesos/mesos-slave/slaves/20160222-161607-2315648778-5050-44877-S0/frameworks/20160222-183113-2332425994-5050-54405-0145/executors/20160222-161607-2315648778-5050-44877-S0/runs/62137cc2-317e-4500-982b-0007106aec40/blockmgr-16b8353c-ac6c-4019-b8e7-a16659cf6fe2/33/shuffle_2_122_0.index.8a14cde6-2877-4634-b4c2-fc9384f2ce8d
>>  (No such file or directory)
>>         at java.io.FileOutputStream.open0(Native Method)
>>         at java.io.FileOutputStream.open(FileOutputStream.java:270)
>>         at java.io.FileOutputStream.<init>(FileOutputStream.java:213)
>>         at java.io.FileOutputStream.<init>(FileOutputStream.java:162)
>>         at 
>> org.apache.spark.shuffle.IndexShuffleBlockResolver.writeIndexFileAndCommit(IndexShuffleBlockResolver.scala:141)
>>         at 
>> org.apache.spark.shuffle.sort.BypassMergeSortShuffleWriter.write(BypassMergeSortShuffleWriter.java:161)
>>         at 
>> org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:73)
>>         at 
>> org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:41)
>>         at org.apache.spark.scheduler.Task.run(Task.scala:89)
>>         at 
>> org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:213)
>>         at 
>> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
>>         at 
>> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
>>         at java.lang.Thread.run(Thread.java:745)
>>
>> I don’t know if the first line of task scheduler error is related, I
>> asked in this mailing list before but had no luck to find the cause.
>>
>> As I dig further, I found the following OOM exception,
>>
>> 16/03/05 04:01:20 ERROR SparkUncaughtExceptionHandler: Uncaught exception in 
>> thread Thread[Executor task launch worker-83,5,main]
>> java.lang.OutOfMemoryError: Unable to acquire 262144 bytes of memory, got 
>> 160165
>>         at 
>> org.apache.spark.memory.MemoryConsumer.allocateArray(MemoryConsumer.java:91)
>>         at 
>> org.apache.spark.unsafe.map.BytesToBytesMap.allocate(BytesToBytesMap.java:735)
>>         at 
>> org.apache.spark.unsafe.map.BytesToBytesMap.<init>(BytesToBytesMap.java:197)
>>         at 
>> org.apache.spark.unsafe.map.BytesToBytesMap.<init>(BytesToBytesMap.java:212)
>>         at 
>> org.apache.spark.sql.execution.UnsafeFixedWidthAggregationMap.<init>(UnsafeFixedWidthAggregationMap.java:103)
>>         at 
>> org.apache.spark.sql.execution.aggregate.TungstenAggregationIterator.<init>(TungstenAggregationIterator.scala:483)
>>         at 
>> org.apache.spark.sql.execution.aggregate.TungstenAggregate$$anonfun$doExecute$1$$anonfun$2.apply(TungstenAggregate.scala:95)
>>         at 
>> org.apache.spark.sql.execution.aggregate.TungstenAggregate$$anonfun$doExecute$1$$anonfun$2.apply(TungstenAggregate.scala:86)
>>         at 
>> org.apache.spark.rdd.RDD$$anonfun$mapPartitions$1$$anonfun$apply$20.apply(RDD.scala:710)
>>         at 
>> org.apache.spark.rdd.RDD$$anonfun$mapPartitions$1$$anonfun$apply$20.apply(RDD.scala:710)
>>         at 
>> org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38)
>>         at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:306)
>>         at org.apache.spark.rdd.RDD.iterator(RDD.scala:270)
>>         at 
>> org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38)
>>         at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:306)
>>         at org.apache.spark.rdd.RDD.iterator(RDD.scala:270)
>>         at 
>> org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:73)
>>         at 
>> org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:41)
>>         at org.apache.spark.scheduler.Task.run(Task.scala:89)
>>         at 
>> org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:213)
>>         at 
>> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
>>         at 
>> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
>>         at java.lang.Thread.run(Thread.java:745)
>>
>> Anyone knows if this is a bug, or some configuration is wrong?
>> ------------------------------
>>
>> BR,
>> Todd Leo
>> ​
>>
>

Reply via email to