Hi There I ran into a problem and can’t find a solution.
I was running bin/pyspark < ../python/wordcount.py The wordcount.py is here: ======================================== import sys from operator import add from pyspark import SparkContext datafile = '/mnt/data/m1.txt' sc = SparkContext() outfile = datafile + '.freq' lines = sc.textFile(datafile, 1) counts = lines.flatMap(lambda x: x.split(' ')) \ .map(lambda x: (x, 1)) \ .reduceByKey(add) output = counts.collect() outf = open(outfile, 'w') for (word, count) in output: outf.write(word.encode('utf-8') + '\t' + str(count) + '\n') outf.close() ======================================== The error message is here: 14/08/08 16:01:59 ERROR Executor: Exception in task 0.0 in stage 1.0 (TID 0) java.io.FileNotFoundException: /tmp/spark-local-20140808160150-d36b/12/shuffle_0_0_468 (Too many open files) at java.io.FileOutputStream.open(Native Method) at java.io.FileOutputStream.<init>(FileOutputStream.java:221) at org.apache.spark.storage.DiskBlockObjectWriter.open(BlockObjectWriter.scala:107) at org.apache.spark.storage.DiskBlockObjectWriter.write(BlockObjectWriter.scala:175) at org.apache.spark.shuffle.hash.HashShuffleWriter$$anonfun$write$1.apply(HashShuffleWriter.scala:67) at org.apache.spark.shuffle.hash.HashShuffleWriter$$anonfun$write$1.apply(HashShuffleWriter.scala:65) at scala.collection.Iterator$class.foreach(Iterator.scala:727) at scala.collection.AbstractIterator.foreach(Iterator.scala:1157) at org.apache.spark.shuffle.hash.HashShuffleWriter.write(HashShuffleWriter.scala:65) at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:68) at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:41) at org.apache.spark.scheduler.Task.run(Task.scala:54) at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:199) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) at java.lang.Thread.run(Thread.java:744) The m1.txt is about 4G, and I have >120GB Ram and used -Xmx120GB. It is on Ubuntu. Any help please? Best Baoqiang Cao Blog: http://baoqiang.org Email: bqcaom...@gmail.com