Hi all, I've set up a 4-node spark cluster (the nodes are r3.large) with the spark-ec2 script. I've been trying to run a job on this cluster, and I'm trying to figure out why I get the following exception:
java.net.SocketException: Connection reset at java.net.SocketInputStream.read(SocketInputStream.java:196) at java.net.SocketInputStream.read(SocketInputStream.java:122) at sun.security.ssl.InputRecord.readFully(InputRecord.java:442) at sun.security.ssl.InputRecord.readV3Record(InputRecord.java:554) at sun.security.ssl.InputRecord.read(InputRecord.java:509) at sun.security.ssl.SSLSocketImpl.readRecord(SSLSocketImpl.java:927) at sun.security.ssl.SSLSocketImpl.readDataRecord(SSLSocketImpl.java:884) at sun.security.ssl.AppInputStream.read(AppInputStream.java:102) at java.io.BufferedInputStream.read1(BufferedInputStream.java:273) at java.io.BufferedInputStream.read(BufferedInputStream.java:334) at org.apache.commons.httpclient.ContentLengthInputStream.read(ContentLengthInputStream.java:170) at java.io.FilterInputStream.read(FilterInputStream.java:133) at org.apache.commons.httpclient.AutoCloseInputStream.read(AutoCloseInputStream.java:108) at org.jets3t.service.io.InterruptableInputStream.read(InterruptableInputStream.java:76) at org.jets3t.service.impl.rest.httpclient.HttpMethodReleaseInputStream.read(HttpMethodReleaseInputStream.java:136) at org.apache.hadoop.fs.s3native.NativeS3FileSystem$NativeS3FsInputStream.read(NativeS3FileSystem.java:98) at java.io.BufferedInputStream.read1(BufferedInputStream.java:273) at java.io.BufferedInputStream.read(BufferedInputStream.java:334) at java.io.DataInputStream.read(DataInputStream.java:100) at org.apache.hadoop.util.LineReader.readLine(LineReader.java:134) at org.apache.hadoop.mapred.LineRecordReader.next(LineRecordReader.java:133) at org.apache.hadoop.mapred.LineRecordReader.next(LineRecordReader.java:38) at org.apache.spark.rdd.HadoopRDD$$anon$1.getNext(HadoopRDD.scala:164) at org.apache.spark.rdd.HadoopRDD$$anon$1.getNext(HadoopRDD.scala:149) at org.apache.spark.util.NextIterator.hasNext(NextIterator.scala:71) at org.apache.spark.InterruptibleIterator.hasNext(InterruptibleIterator.scala:27) at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:327) at scala.collection.Iterator$$anon$13.hasNext(Iterator.scala:371) at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:327) at scala.collection.Iterator$class.foreach(Iterator.scala:727) at scala.collection.AbstractIterator.foreach(Iterator.scala:1157) at scala.collection.generic.Growable$class.$plus$plus$eq(Growable.scala:48) at scala.collection.mutable.ArrayBuffer.$plus$plus$eq(ArrayBuffer.scala:103) at org.apache.spark.CacheManager.getOrCompute(CacheManager.scala:75) at org.apache.spark.rdd.RDD.iterator(RDD.scala:230) at org.apache.spark.rdd.FlatMappedRDD.compute(FlatMappedRDD.scala:33) at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:241) at org.apache.spark.rdd.RDD.iterator(RDD.scala:232) at org.apache.spark.rdd.MappedRDD.compute(MappedRDD.scala:31) at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:241) at org.apache.spark.rdd.RDD.iterator(RDD.scala:232) at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:161) at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:102) at org.apache.spark.scheduler.Task.run(Task.scala:53) at org.apache.spark.executor.Executor$TaskRunner$$anonfun$run$1.apply$mcV$sp(Executor.scala:211) at org.apache.spark.deploy.SparkHadoopUtil$$anon$1.run(SparkHadoopUtil.scala:42) at org.apache.spark.deploy.SparkHadoopUtil$$anon$1.run(SparkHadoopUtil.scala:41) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:415) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1121) at org.apache.spark.deploy.SparkHadoopUtil.runAsUser(SparkHadoopUtil.scala:41) at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:176) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) at java.lang.Thread.run(Thread.java:744) Upon inspection, the error seems to be while reading from a s3n address. The data itself is not big (around 35 megabytes) but I am partitioning it into 8 groups. Is there a way to make these kinds of reads more reliable? If not, is there a way to increase the maximum number of errors tolerated in a job before it is killed? Thanks! Dan -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/SocketException-when-reading-from-S3-s3n-format-tp6889.html Sent from the Apache Spark User List mailing list archive at Nabble.com.
