Hi 

I'm running Spark 0.9.1 on hadoop cluster - cdh4.2.1, with YARN.

I have a job, that performs a few transformations on a given file and joins 
that file with some other.
The job itself finishes with success, however some tasks are failed and then 
after rerun succeeds.

During the development process I've been experimenting with different settings 
and
have those now in the code:

- additional hadoop config:
        "fs.hdfs.impl.disable.cache", "true"
- spark config set on SparkContext:

      .set("spark.test.disableBlockManagerHeartBeat", "true")
      .set("spark.serializer", "org.apache.spark.serializer.KryoSerializer")
      .set("spark.default.parallelism", "1000")
      .set("spark.shuffle.netty.connect.timeout", "300000")
      .set("spark.storage.blockManagerSlaveTimeoutMs", "300000")

When I look into the logs I see lots of error messages:

- This looks like some problems with HA - but I've checked namenodes during the 
job was running, and there
was no switch between master and slave namenode.

14/05/14 15:25:44 ERROR security.UserGroupInformation: 
PriviledgedActionException as:hc_client_reco_dev (auth:SIMPLE) 
cause:org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.ipc.StandbyException):
 Operation category READ is not supported in state standby
14/05/14 15:25:44 WARN ipc.Client: Exception encountered while connecting to 
the server : 
org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.ipc.StandbyException): 
Operation category READ is not supported in state standby
14/05/14 15:25:44 ERROR security.UserGroupInformation: 
PriviledgedActionException as:hc_client_reco_dev (auth:SIMPLE) 
cause:org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.ipc.StandbyException):
 Operation category READ is not supported in state standby


- there are multiple exceptions logged as INFO, don't know if this is serious:

14/05/14 15:30:06 ERROR network.ConnectionManager: Corresponding 
SendingConnectionManagerId not found
14/05/14 15:30:06 INFO network.ConnectionManager: key already cancelled ? 
sun.nio.ch.SelectionKeyImpl@2c34bc84
java.nio.channels.CancelledKeyException
        at 
org.apache.spark.network.ConnectionManager.run(ConnectionManager.scala:341)
        at 
org.apache.spark.network.ConnectionManager$$anon$3.run(ConnectionManager.scala:98)


- I also see a few of those - which seems strange

14/05/14 15:26:45 ERROR executor.Executor: Exception in task ID 2081
java.io.FileNotFoundException: 
/data/storage/1/yarn/local/usercache/hc_client_reco_dev/appcache/application_1398268932983_1221792/spark-local-20140514152006-9c62/38/shuffle_5_121_395
 (No such file or directory)
        at java.io.RandomAccessFile.open(Native Method)
        at java.io.RandomAccessFile.<init>(RandomAccessFile.java:233)
        at org.apache.spark.storage.DiskStore.getBytes(DiskStore.scala:87)
        at org.apache.spark.storage.DiskStore.getValues(DiskStore.scala:105)
        at 
org.apache.spark.storage.BlockManager.getLocalFromDisk(BlockManager.scala:265)
        at 
org.apache.spark.storage.BlockFetcherIterator$BasicBlockFetcherIterator$$anonfun$getLocalBlocks$1.apply(BlockFetcherIterator.scala:205)
        at 
org.apache.spark.storage.BlockFetcherIterator$BasicBlockFetcherIterator$$anonfun$getLocalBlocks$1.apply(BlockFetcherIterator.scala:204)
        at 
scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59)
        at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:47)
        at 
org.apache.spark.storage.BlockFetcherIterator$BasicBlockFetcherIterator.getLocalBlocks(BlockFetcherIterator.scala:204)
        at 
org.apache.spark.storage.BlockFetcherIterator$BasicBlockFetcherIterator.initialize(BlockFetcherIterator.scala:235)
        at 
org.apache.spark.storage.BlockManager.getMultiple(BlockManager.scala:452)
        at 
org.apache.spark.BlockStoreShuffleFetcher.fetch(BlockStoreShuffleFetcher.scala:77)
        at 
org.apache.spark.rdd.CoGroupedRDD$$anonfun$compute$2.apply(CoGroupedRDD.scala:125)
        at 
org.apache.spark.rdd.CoGroupedRDD$$anonfun$compute$2.apply(CoGroupedRDD.scala:115)
        at 
scala.collection.TraversableLike$WithFilter$$anonfun$foreach$1.apply(TraversableLike.scala:772)
        at 
scala.collection.IndexedSeqOptimized$class.foreach(IndexedSeqOptimized.scala:33)
        at scala.collection.mutable.ArrayOps$ofRef.foreach(ArrayOps.scala:108)
        at 
scala.collection.TraversableLike$WithFilter.foreach(TraversableLike.scala:771)


Could someone suggest any solutions to that?

Regards 
Marcin

Reply via email to