Re: Recent heartbeats
Thanks Patrick...I searched in the archives and found the answer...tuning the akka and gc params On Fri, Apr 4, 2014 at 10:35 PM, Patrick Wendell wrote: > I answered this over on the user list... > > > On Fri, Apr 4, 2014 at 6:13 PM, Debasish Das >wrote: > > > Hi, > > > > Also posted it on user but then I realized it might be more involved. > > > > In my ALS runs I am noticing messages that complain about heart beats: > > > > 14/04/04 20:43:09 WARN BlockManagerMasterActor: Removing BlockManager > > BlockManagerId(17, machine1, 53419, 0) with no recent heart beats: > 48476ms > > exceeds 45000ms > > 14/04/04 20:43:09 WARN BlockManagerMasterActor: Removing BlockManager > > BlockManagerId(12, machine2, 60714, 0) with no recent heart beats: > 45328ms > > exceeds 45000ms > > 14/04/04 20:43:09 WARN BlockManagerMasterActor: Removing BlockManager > > BlockManagerId(19, machine3, 39496, 0) with no recent heart beats: > 53259ms > > exceeds 45000ms > > > > Is this some issue with the underlying jvm over which akka is run ? Can I > > increase the heartbeat somehow to get these messages resolved ? > > > > Any more insight about the possible cause for the heartbeat will be > > helpful... > > > > Thanks. > > Deb > > >
Master compilation
I am synced with apache/spark master but getting error in spark/sql compilation... Is the master broken ? [info] Compiling 34 Scala sources to /home/debasish/spark_deploy/sql/core/target/scala-2.10/classes... [error] /home/debasish/spark_deploy/sql/core/src/main/scala/org/apache/spark/sql/parquet/ParquetRelation.scala:106: value getGlobal is not a member of object java.util.logging.Logger [error] logger.setParent(Logger.getGlobal) [error] ^ [error] one error found [error] (sql/compile:compile) Compilation failed [error] Total time: 171 s, completed Apr 5, 2014 4:58:41 PM Thanks. Deb
Re: Master compilation
That method was added in Java 7. The project is on Java 6, so I think this was just an inadvertent error in a recent PR (it was the 'Spark parquet improvements' one). I'll open a hot-fix PR after looking for other stuff like this that might have snuck in. -- Sean Owen | Director, Data Science | London On Sat, Apr 5, 2014 at 10:04 PM, Debasish Das wrote: > I am synced with apache/spark master but getting error in spark/sql > compilation... > > Is the master broken ? > > [info] Compiling 34 Scala sources to > /home/debasish/spark_deploy/sql/core/target/scala-2.10/classes... > [error] > /home/debasish/spark_deploy/sql/core/src/main/scala/org/apache/spark/sql/parquet/ParquetRelation.scala:106: > value getGlobal is not a member of object java.util.logging.Logger > [error] logger.setParent(Logger.getGlobal) > [error] ^ > [error] one error found > [error] (sql/compile:compile) Compilation failed > [error] Total time: 171 s, completed Apr 5, 2014 4:58:41 PM > > Thanks. > Deb
Re: Master compilation
I can compile with Java 7...let me try that... On Sat, Apr 5, 2014 at 2:19 PM, Sean Owen wrote: > That method was added in Java 7. The project is on Java 6, so I think > this was just an inadvertent error in a recent PR (it was the 'Spark > parquet improvements' one). > > I'll open a hot-fix PR after looking for other stuff like this that > might have snuck in. > -- > Sean Owen | Director, Data Science | London > > > On Sat, Apr 5, 2014 at 10:04 PM, Debasish Das > wrote: > > I am synced with apache/spark master but getting error in spark/sql > > compilation... > > > > Is the master broken ? > > > > [info] Compiling 34 Scala sources to > > /home/debasish/spark_deploy/sql/core/target/scala-2.10/classes... > > [error] > > > /home/debasish/spark_deploy/sql/core/src/main/scala/org/apache/spark/sql/parquet/ParquetRelation.scala:106: > > value getGlobal is not a member of object java.util.logging.Logger > > [error] logger.setParent(Logger.getGlobal) > > [error] ^ > > [error] one error found > > [error] (sql/compile:compile) Compilation failed > > [error] Total time: 171 s, completed Apr 5, 2014 4:58:41 PM > > > > Thanks. > > Deb >
Re: Master compilation
I verified this is happening for both CDH4.5 and 1.0.4...My deploy environment is Java 6...so Java 7 compilation is not going to help... Is this the PR which caused it ? Andre Schumacher fbebaedSpark parquet improvements A few improvements to the Parquet support for SQL queries: - Instead of files a ParquetRelation is now backed by a directory, which simplifies importing data from other sources - InsertIntoParquetTable operation now supports switching between overwriting or appending (at least in HiveQL) - tests now use the new API - Parquet logging can be set to WARNING level (Default) - Default compression for Parquet files (GZIP, as in parquet-mr) Author: Andre Schumacher &...2 days agoSPARK-1383 I will go to a stable checkin before this On Sat, Apr 5, 2014 at 2:22 PM, Debasish Das wrote: > I can compile with Java 7...let me try that... > > > On Sat, Apr 5, 2014 at 2:19 PM, Sean Owen wrote: > >> That method was added in Java 7. The project is on Java 6, so I think >> this was just an inadvertent error in a recent PR (it was the 'Spark >> parquet improvements' one). >> >> I'll open a hot-fix PR after looking for other stuff like this that >> might have snuck in. >> -- >> Sean Owen | Director, Data Science | London >> >> >> On Sat, Apr 5, 2014 at 10:04 PM, Debasish Das >> wrote: >> > I am synced with apache/spark master but getting error in spark/sql >> > compilation... >> > >> > Is the master broken ? >> > >> > [info] Compiling 34 Scala sources to >> > /home/debasish/spark_deploy/sql/core/target/scala-2.10/classes... >> > [error] >> > >> /home/debasish/spark_deploy/sql/core/src/main/scala/org/apache/spark/sql/parquet/ParquetRelation.scala:106: >> > value getGlobal is not a member of object java.util.logging.Logger >> > [error] logger.setParent(Logger.getGlobal) >> > [error] ^ >> > [error] one error found >> > [error] (sql/compile:compile) Compilation failed >> > [error] Total time: 171 s, completed Apr 5, 2014 4:58:41 PM >> > >> > Thanks. >> > Deb >> > >
Re: Master compilation
If you want to submit a hot fix for this issue specifically please do. I'm not sure why it didn't fail our build... On Sat, Apr 5, 2014 at 2:30 PM, Debasish Das wrote: > I verified this is happening for both CDH4.5 and 1.0.4...My deploy > environment is Java 6...so Java 7 compilation is not going to help... > > Is this the PR which caused it ? > > Andre Schumacher > > fbebaedSpark parquet improvements A few improvements to the Parquet > support for SQL queries: - Instead of files a ParquetRelation is now backed > by a directory, which simplifies importing data from other sources - > InsertIntoParquetTable operation now supports switching between overwriting > or appending (at least in HiveQL) - tests now use the new API - Parquet > logging can be set to WARNING level (Default) - Default compression for > Parquet files (GZIP, as in parquet-mr) Author: Andre Schumacher &...2 > days agoSPARK-1383 > > I will go to a stable checkin before this > > > > > On Sat, Apr 5, 2014 at 2:22 PM, Debasish Das >wrote: > > > I can compile with Java 7...let me try that... > > > > > > On Sat, Apr 5, 2014 at 2:19 PM, Sean Owen wrote: > > > >> That method was added in Java 7. The project is on Java 6, so I think > >> this was just an inadvertent error in a recent PR (it was the 'Spark > >> parquet improvements' one). > >> > >> I'll open a hot-fix PR after looking for other stuff like this that > >> might have snuck in. > >> -- > >> Sean Owen | Director, Data Science | London > >> > >> > >> On Sat, Apr 5, 2014 at 10:04 PM, Debasish Das > > >> wrote: > >> > I am synced with apache/spark master but getting error in spark/sql > >> > compilation... > >> > > >> > Is the master broken ? > >> > > >> > [info] Compiling 34 Scala sources to > >> > /home/debasish/spark_deploy/sql/core/target/scala-2.10/classes... > >> > [error] > >> > > >> > /home/debasish/spark_deploy/sql/core/src/main/scala/org/apache/spark/sql/parquet/ParquetRelation.scala:106: > >> > value getGlobal is not a member of object java.util.logging.Logger > >> > [error] logger.setParent(Logger.getGlobal) > >> > [error] ^ > >> > [error] one error found > >> > [error] (sql/compile:compile) Compilation failed > >> > [error] Total time: 171 s, completed Apr 5, 2014 4:58:41 PM > >> > > >> > Thanks. > >> > Deb > >> > > > > >
Re: Master compilation
Will do. I'm just finishing a recompile to check for anything else like this. The reason is because the tests run with Java 7 (like lots of us do including me) so it used the Java 7 classpath and found the class. It's possible to use Java 7 with the Java 6 -bootclasspath. Or just use Java 6. -- Sean Owen | Director, Data Science | London On Sat, Apr 5, 2014 at 11:06 PM, Patrick Wendell wrote: > If you want to submit a hot fix for this issue specifically please do. I'm > not sure why it didn't fail our build... > > > On Sat, Apr 5, 2014 at 2:30 PM, Debasish Das wrote: > >> I verified this is happening for both CDH4.5 and 1.0.4...My deploy >> environment is Java 6...so Java 7 compilation is not going to help... >> >> Is this the PR which caused it ? >> >> Andre Schumacher >> >> fbebaedSpark parquet improvements A few improvements to the Parquet >> support for SQL queries: - Instead of files a ParquetRelation is now backed >> by a directory, which simplifies importing data from other sources - >> InsertIntoParquetTable operation now supports switching between overwriting >> or appending (at least in HiveQL) - tests now use the new API - Parquet >> logging can be set to WARNING level (Default) - Default compression for >> Parquet files (GZIP, as in parquet-mr) Author: Andre Schumacher &...2 >> days agoSPARK-1383 >> >> I will go to a stable checkin before this >> >> >> >> >> On Sat, Apr 5, 2014 at 2:22 PM, Debasish Das > >wrote: >> >> > I can compile with Java 7...let me try that... >> > >> > >> > On Sat, Apr 5, 2014 at 2:19 PM, Sean Owen wrote: >> > >> >> That method was added in Java 7. The project is on Java 6, so I think >> >> this was just an inadvertent error in a recent PR (it was the 'Spark >> >> parquet improvements' one). >> >> >> >> I'll open a hot-fix PR after looking for other stuff like this that >> >> might have snuck in. >> >> -- >> >> Sean Owen | Director, Data Science | London >> >> >> >> >> >> On Sat, Apr 5, 2014 at 10:04 PM, Debasish Das > > >> >> wrote: >> >> > I am synced with apache/spark master but getting error in spark/sql >> >> > compilation... >> >> > >> >> > Is the master broken ? >> >> > >> >> > [info] Compiling 34 Scala sources to >> >> > /home/debasish/spark_deploy/sql/core/target/scala-2.10/classes... >> >> > [error] >> >> > >> >> >> /home/debasish/spark_deploy/sql/core/src/main/scala/org/apache/spark/sql/parquet/ParquetRelation.scala:106: >> >> > value getGlobal is not a member of object java.util.logging.Logger >> >> > [error] logger.setParent(Logger.getGlobal) >> >> > [error] ^ >> >> > [error] one error found >> >> > [error] (sql/compile:compile) Compilation failed >> >> > [error] Total time: 171 s, completed Apr 5, 2014 4:58:41 PM >> >> > >> >> > Thanks. >> >> > Deb >> >> >> > >> > >>
Re: Master compilation
@patrick our cluster still has java6 deployed...and I compiled using jdk6... Sean is looking into it...this api is in java7 but not java6... On Sat, Apr 5, 2014 at 3:06 PM, Patrick Wendell wrote: > If you want to submit a hot fix for this issue specifically please do. I'm > not sure why it didn't fail our build... > > > On Sat, Apr 5, 2014 at 2:30 PM, Debasish Das >wrote: > > > I verified this is happening for both CDH4.5 and 1.0.4...My deploy > > environment is Java 6...so Java 7 compilation is not going to help... > > > > Is this the PR which caused it ? > > > > Andre Schumacher > > > > fbebaedSpark parquet improvements A few improvements to the > Parquet > > support for SQL queries: - Instead of files a ParquetRelation is now > backed > > by a directory, which simplifies importing data from other sources - > > InsertIntoParquetTable operation now supports switching between > overwriting > > or appending (at least in HiveQL) - tests now use the new API - Parquet > > logging can be set to WARNING level (Default) - Default compression for > > Parquet files (GZIP, as in parquet-mr) Author: Andre Schumacher &...2 > > days agoSPARK-1383 > > > > I will go to a stable checkin before this > > > > > > > > > > On Sat, Apr 5, 2014 at 2:22 PM, Debasish Das > >wrote: > > > > > I can compile with Java 7...let me try that... > > > > > > > > > On Sat, Apr 5, 2014 at 2:19 PM, Sean Owen wrote: > > > > > >> That method was added in Java 7. The project is on Java 6, so I think > > >> this was just an inadvertent error in a recent PR (it was the 'Spark > > >> parquet improvements' one). > > >> > > >> I'll open a hot-fix PR after looking for other stuff like this that > > >> might have snuck in. > > >> -- > > >> Sean Owen | Director, Data Science | London > > >> > > >> > > >> On Sat, Apr 5, 2014 at 10:04 PM, Debasish Das < > debasish.da...@gmail.com > > > > > >> wrote: > > >> > I am synced with apache/spark master but getting error in spark/sql > > >> > compilation... > > >> > > > >> > Is the master broken ? > > >> > > > >> > [info] Compiling 34 Scala sources to > > >> > /home/debasish/spark_deploy/sql/core/target/scala-2.10/classes... > > >> > [error] > > >> > > > >> > > > /home/debasish/spark_deploy/sql/core/src/main/scala/org/apache/spark/sql/parquet/ParquetRelation.scala:106: > > >> > value getGlobal is not a member of object java.util.logging.Logger > > >> > [error] logger.setParent(Logger.getGlobal) > > >> > [error] ^ > > >> > [error] one error found > > >> > [error] (sql/compile:compile) Compilation failed > > >> > [error] Total time: 171 s, completed Apr 5, 2014 4:58:41 PM > > >> > > > >> > Thanks. > > >> > Deb > > >> > > > > > > > > >
ephemeral storage level in spark ?
Hi, We have a requirement to use a (potential) ephemeral storage, which is not within the VM, which is strongly tied to a worker node. So source of truth for a block would still be within spark; but to actually do computation, we would need to copy data to external device (where it might lie around for a while : so data locality really really helps if we can avoid a subsequent copy if it is already present on computations on same block again). I was wondering if the recently added storage level for tachyon would help in this case (note, tachyon wont help; just the storage level might). What sort of guarantees does it provide ? How extensible is it ? Or is it strongly tied to tachyon with only a generic name ? Thanks, Mridul
Re: ephemeral storage level in spark ?
Hi Mridul, Do you mean the scenario that different Spark applications need to read the same raw data, which is stored in a remote cluster or machines. And the goal is to load the remote raw data only once? Haoyuan On Sat, Apr 5, 2014 at 4:30 PM, Mridul Muralidharan wrote: > Hi, > > We have a requirement to use a (potential) ephemeral storage, which > is not within the VM, which is strongly tied to a worker node. So > source of truth for a block would still be within spark; but to > actually do computation, we would need to copy data to external device > (where it might lie around for a while : so data locality really > really helps if we can avoid a subsequent copy if it is already > present on computations on same block again). > > I was wondering if the recently added storage level for tachyon would > help in this case (note, tachyon wont help; just the storage level > might). > What sort of guarantees does it provide ? How extensible is it ? Or is > it strongly tied to tachyon with only a generic name ? > > > Thanks, > Mridul > -- Haoyuan Li Algorithms, Machines, People Lab, EECS, UC Berkeley http://www.cs.berkeley.edu/~haoyuan/
Re: ephemeral storage level in spark ?
No, I am thinking along lines of writing to an accelerator card or dedicated card with its own memory. Regards, Mridul On Apr 6, 2014 5:19 AM, "Haoyuan Li" wrote: > Hi Mridul, > > Do you mean the scenario that different Spark applications need to read the > same raw data, which is stored in a remote cluster or machines. And the > goal is to load the remote raw data only once? > > Haoyuan > > > On Sat, Apr 5, 2014 at 4:30 PM, Mridul Muralidharan >wrote: > > > Hi, > > > > We have a requirement to use a (potential) ephemeral storage, which > > is not within the VM, which is strongly tied to a worker node. So > > source of truth for a block would still be within spark; but to > > actually do computation, we would need to copy data to external device > > (where it might lie around for a while : so data locality really > > really helps if we can avoid a subsequent copy if it is already > > present on computations on same block again). > > > > I was wondering if the recently added storage level for tachyon would > > help in this case (note, tachyon wont help; just the storage level > > might). > > What sort of guarantees does it provide ? How extensible is it ? Or is > > it strongly tied to tachyon with only a generic name ? > > > > > > Thanks, > > Mridul > > > > > > -- > Haoyuan Li > Algorithms, Machines, People Lab, EECS, UC Berkeley > http://www.cs.berkeley.edu/~haoyuan/ >
Re: Master compilation
With jdk7 I could compile it fine: java version "1.7.0_51" Java(TM) SE Runtime Environment (build 1.7.0_51-b13) Java HotSpot(TM) 64-Bit Server VM (build 24.51-b03, mixed mode) What happens if I say take the jar and try to deploy it on ancient centos6 default on cluster ? java -version java version "1.6.0_31" Java(TM) SE Runtime Environment (build 1.6.0_31-b04) Java HotSpot(TM) 64-Bit Server VM (build 20.6-b01, mixed mode) Breeze compilation also fails with jdk6, runs fine with jdk7 and breeze jar is already included in spark mllib with Xiangrui's Sparse vector checkin Does that mean that classes compiled and generated using jdk7 will run fine on jre6 ? I am confused On Sat, Apr 5, 2014 at 3:09 PM, Sean Owen wrote: > Will do. I'm just finishing a recompile to check for anything else like > this. > > The reason is because the tests run with Java 7 (like lots of us do > including me) so it used the Java 7 classpath and found the class. > It's possible to use Java 7 with the Java 6 -bootclasspath. Or just > use Java 6. > -- > Sean Owen | Director, Data Science | London > > > On Sat, Apr 5, 2014 at 11:06 PM, Patrick Wendell > wrote: > > If you want to submit a hot fix for this issue specifically please do. > I'm > > not sure why it didn't fail our build... > > > > > > On Sat, Apr 5, 2014 at 2:30 PM, Debasish Das >wrote: > > > >> I verified this is happening for both CDH4.5 and 1.0.4...My deploy > >> environment is Java 6...so Java 7 compilation is not going to help... > >> > >> Is this the PR which caused it ? > >> > >> Andre Schumacher > >> > >> fbebaedSpark parquet improvements A few improvements to the > Parquet > >> support for SQL queries: - Instead of files a ParquetRelation is now > backed > >> by a directory, which simplifies importing data from other sources - > >> InsertIntoParquetTable operation now supports switching between > overwriting > >> or appending (at least in HiveQL) - tests now use the new API - Parquet > >> logging can be set to WARNING level (Default) - Default compression for > >> Parquet files (GZIP, as in parquet-mr) Author: Andre Schumacher &... > 2 > >> days agoSPARK-1383 > >> > >> I will go to a stable checkin before this > >> > >> > >> > >> > >> On Sat, Apr 5, 2014 at 2:22 PM, Debasish Das >> >wrote: > >> > >> > I can compile with Java 7...let me try that... > >> > > >> > > >> > On Sat, Apr 5, 2014 at 2:19 PM, Sean Owen wrote: > >> > > >> >> That method was added in Java 7. The project is on Java 6, so I think > >> >> this was just an inadvertent error in a recent PR (it was the 'Spark > >> >> parquet improvements' one). > >> >> > >> >> I'll open a hot-fix PR after looking for other stuff like this that > >> >> might have snuck in. > >> >> -- > >> >> Sean Owen | Director, Data Science | London > >> >> > >> >> > >> >> On Sat, Apr 5, 2014 at 10:04 PM, Debasish Das < > debasish.da...@gmail.com > >> > > >> >> wrote: > >> >> > I am synced with apache/spark master but getting error in spark/sql > >> >> > compilation... > >> >> > > >> >> > Is the master broken ? > >> >> > > >> >> > [info] Compiling 34 Scala sources to > >> >> > /home/debasish/spark_deploy/sql/core/target/scala-2.10/classes... > >> >> > [error] > >> >> > > >> >> > >> > /home/debasish/spark_deploy/sql/core/src/main/scala/org/apache/spark/sql/parquet/ParquetRelation.scala:106: > >> >> > value getGlobal is not a member of object java.util.logging.Logger > >> >> > [error] logger.setParent(Logger.getGlobal) > >> >> > [error] ^ > >> >> > [error] one error found > >> >> > [error] (sql/compile:compile) Compilation failed > >> >> > [error] Total time: 171 s, completed Apr 5, 2014 4:58:41 PM > >> >> > > >> >> > Thanks. > >> >> > Deb > >> >> > >> > > >> > > >> >
ALS array index out of bound with 50 factors
Hi, I deployed apache/spark master today and recently there were many ALS related checkins and enhancements.. I am running ALS with explicit feedback and I remember most enhancements were related to implicit feedback... With 25 factors my runs were successful but with 50 factors I am getting array index out of bound... Note that I was hitting gc errors before with an older version of spark but it seems like the sparse matrix partitioning scheme has changed now...data caching looks much balanced now...earlier one node was becoming bottleneck...Although I ran with 64g memory per node... There are around 3M products, 25M users... Anyone noticed this bug or something similar ? 14/04/05 23:03:15 WARN TaskSetManager: Loss was due to java.lang.ArrayIndexOutOfBoundsException java.lang.ArrayIndexOutOfBoundsException: 81029 at org.apache.spark.mllib.recommendation.ALS$$anonfun$org$apache$spark$mllib$recommendation$ALS$$updateBlock$1$$anonfun$apply$mcVI$sp$1.apply$mcVI$sp(ALS.scala:450) at scala.collection.immutable.Range.foreach$mVc$sp(Range.scala:141) at org.apache.spark.mllib.recommendation.ALS$$anonfun$org$apache$spark$mllib$recommendation$ALS$$updateBlock$1.apply$mcVI$sp(ALS.scala:446) at scala.collection.immutable.Range.foreach$mVc$sp(Range.scala:141) at org.apache.spark.mllib.recommendation.ALS.org $apache$spark$mllib$recommendation$ALS$$updateBlock(ALS.scala:445) at org.apache.spark.mllib.recommendation.ALS$$anonfun$org$apache$spark$mllib$recommendation$ALS$$updateFeatures$2.apply(ALS.scala:416) at org.apache.spark.mllib.recommendation.ALS$$anonfun$org$apache$spark$mllib$recommendation$ALS$$updateFeatures$2.apply(ALS.scala:415) at org.apache.spark.rdd.MappedValuesRDD$$anonfun$compute$1.apply(MappedValuesRDD.scala:31) at org.apache.spark.rdd.MappedValuesRDD$$anonfun$compute$1.apply(MappedValuesRDD.scala:31) at scala.collection.Iterator$$anon$11.next(Iterator.scala:328) at org.apache.spark.rdd.CoGroupedRDD$$anonfun$compute$4.apply(CoGroupedRDD.scala:149) at org.apache.spark.rdd.CoGroupedRDD$$anonfun$compute$4.apply(CoGroupedRDD.scala:147) at scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59) at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:47) at org.apache.spark.rdd.CoGroupedRDD.compute(CoGroupedRDD.scala:147) at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:229) at org.apache.spark.rdd.RDD.iterator(RDD.scala:220) at org.apache.spark.rdd.MappedValuesRDD.compute(MappedValuesRDD.scala:31) at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:229) at org.apache.spark.rdd.RDD.iterator(RDD.scala:220) at org.apache.spark.rdd.FlatMappedValuesRDD.compute(FlatMappedValuesRDD.scala:31) at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:229) at org.apache.spark.rdd.RDD.iterator(RDD.scala:220) at org.apache.spark.rdd.FlatMappedRDD.compute(FlatMappedRDD.scala:33) at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:229) at org.apache.spark.rdd.RDD.iterator(RDD.scala:220) at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:161) at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:102) at org.apache.spark.scheduler.Task.run(Task.scala:52) at org.apache.spark.executor.Executor$TaskRunner$$anonfun$run$1.apply$mcV$sp(Executor.scala:211) at org.apache.spark.deploy.SparkHadoopUtil$$anon$1.run(SparkHadoopUtil.scala:43) at org.apache.spark.deploy.SparkHadoopUtil$$anon$1.run(SparkHadoopUtil.scala:42) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:396) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1408) at org.apache.spark.deploy.SparkHadoopUtil.runAsUser(SparkHadoopUtil.scala:42) at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:176) at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908) at java.lang.Thread.run(Thread.java:662) Thanks. Deb