Thanks Daves and Ron! It indeed was due to ulimit issue. Thanks a lot! Best, Baoqiang Cao Blog: http://baoqiang.org Email: bqcaom...@gmail.com
On Aug 11, 2014, at 3:08 AM, Ron Gonzalez <zlgonza...@yahoo.com> wrote: > If you're running on Ubuntu, do ulimit -n, which gives the max number of > allowed open files. You will have to change the value in > /etc/security/limits.conf to something like 10000, logout and log back in. > > Thanks, > Ron > > Sent from my iPad > >> On Aug 10, 2014, at 10:19 PM, Davies Liu <dav...@databricks.com> wrote: >> >>> On Fri, Aug 8, 2014 at 9:12 AM, Baoqiang Cao <bqcaom...@gmail.com> wrote: >>> Hi There >>> >>> I ran into a problem and can’t find a solution. >>> >>> I was running bin/pyspark < ../python/wordcount.py >> >> you could use bin/spark-submit ../python/wordcount.py >> >>> The wordcount.py is here: >>> >>> ======================================== >>> import sys >>> from operator import add >>> >>> from pyspark import SparkContext >>> >>> datafile = '/mnt/data/m1.txt' >>> >>> sc = SparkContext() >>> outfile = datafile + '.freq' >>> lines = sc.textFile(datafile, 1) >>> counts = lines.flatMap(lambda x: x.split(' ')) \ >>> .map(lambda x: (x, 1)) \ >>> .reduceByKey(add) >>> output = counts.collect() >>> >>> outf = open(outfile, 'w') >>> >>> for (word, count) in output: >>> outf.write(word.encode('utf-8') + '\t' + str(count) + '\n') >>> outf.close() >>> ======================================== >>> >>> >>> The error message is here: >>> >>> 14/08/08 16:01:59 ERROR Executor: Exception in task 0.0 in stage 1.0 (TID 0) >>> java.io.FileNotFoundException: >>> /tmp/spark-local-20140808160150-d36b/12/shuffle_0_0_468 (Too many open >>> files) >> >> This message means that the Spark (JVM) had reach the max number of open >> files, >> there are fd leak some where, unfortunately I can not reproduce this >> problem. What >> is the version of Spark? >> >>> at java.io.FileOutputStream.open(Native Method) >>> at java.io.FileOutputStream.<init>(FileOutputStream.java:221) >>> at >>> org.apache.spark.storage.DiskBlockObjectWriter.open(BlockObjectWriter.scala:107) >>> at >>> org.apache.spark.storage.DiskBlockObjectWriter.write(BlockObjectWriter.scala:175) >>> at >>> org.apache.spark.shuffle.hash.HashShuffleWriter$$anonfun$write$1.apply(HashShuffleWriter.scala:67) >>> at >>> org.apache.spark.shuffle.hash.HashShuffleWriter$$anonfun$write$1.apply(HashShuffleWriter.scala:65) >>> at scala.collection.Iterator$class.foreach(Iterator.scala:727) >>> at scala.collection.AbstractIterator.foreach(Iterator.scala:1157) >>> at >>> org.apache.spark.shuffle.hash.HashShuffleWriter.write(HashShuffleWriter.scala:65) >>> at >>> org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:68) >>> at >>> org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:41) >>> at org.apache.spark.scheduler.Task.run(Task.scala:54) >>> at >>> org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:199) >>> at >>> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) >>> at >>> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) >>> at java.lang.Thread.run(Thread.java:744) >>> >>> >>> The m1.txt is about 4G, and I have >120GB Ram and used -Xmx120GB. It is on >>> Ubuntu. Any help please? >>> >>> Best >>> Baoqiang Cao >>> Blog: http://baoqiang.org >>> Email: bqcaom...@gmail.com >> >> --------------------------------------------------------------------- >> To unsubscribe, e-mail: user-unsubscr...@spark.apache.org >> For additional commands, e-mail: user-h...@spark.apache.org >>