Thanks for your reply. The default max open files was 4096 and it seems it's not enough for Spark. I followed this instruction ( https://easyengine.io/tutorials/linux/increase-open-files-limit/) and increase it a large value. it works perfectly now. Thanks.
On Tue, Dec 22, 2015 at 10:01 PM, Alexander Bezzubov <b...@apache.org> wrote: > Hi, > > wellcome to the Zeppelin community! > > It looks like you are doing everything right but have some > platform-specific issue, that Spark is hitting the limit of open files on > your OS. > > This should not happen, so could you please check what is the current open > file limit on your environment/OS and (just in case) cross-check > spark-secific mailing list, in case that is some kind of known issues. > > -- > Alex > > On Wed, Dec 23, 2015, 10:07 Amirhossein Aleyasin <amir.8...@gmail.com> > wrote: > >> Hello, >> I am new to zeppelin, I just installed it and tried to run the tutorial >> example. >> The "load data into Table" part works perfect, but when I wanted to >> submit the sample queries, it throws the following exception: >> >> >> java.io.FileNotFoundException: >> /tmp/blockmgr-5d2c5999-5593-4f83-9d6d-3c290523ce29/3f/temp_shuffle_102ac16f-b5c6-4cc4-9c8e-b6bc66f17eb5 >> (Too many open files) at java.io.FileOutputStream.open(Native Method) at >> java.io.FileOutputStream.<init>(FileOutputStream.java:221) at >> org.apache.spark.storage.DiskBlockObjectWriter.open(DiskBlockObjectWriter.scala:88) >> at >> org.apache.spark.shuffle.sort.BypassMergeSortShuffleWriter.insertAll(BypassMergeSortShuffleWriter.java:110) >> at >> org.apache.spark.shuffle.sort.SortShuffleWriter.write(SortShuffleWriter.scala:73) >> at >> org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:73) >> at >> org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:41) >> at org.apache.spark.scheduler.Task.run(Task.scala:88) at >> org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:214) at >> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) >> at >> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) >> at java.lang.Thread.run(Thread.java:745) >> >> >> This is the load table code: >> >> import org.apache.commons.io.IOUtils >> import java.net.URL >> import java.nio.charset.Charset >> >> // Zeppelin creates and injects sc (SparkContext) and sqlContext >> (HiveContext or SqlContext) >> // So you don't need create them manually >> >> // load bank data >> val bankText = sc.parallelize( >> IOUtils.toString( >> new URL(" >> https://s3.amazonaws.com/apache-zeppelin/tutorial/bank/bank.csv"), >> Charset.forName("utf8")).split("\n")) >> >> case class Bank(age: Integer, job: String, marital: String, education: >> String, balance: Integer) >> >> val bank = bankText.map(s => s.split(";")).filter(s => s(0) != >> "\"age\"").map( >> s => Bank(s(0).toInt, >> s(1).replaceAll("\"", ""), >> s(2).replaceAll("\"", ""), >> s(3).replaceAll("\"", ""), >> s(5).replaceAll("\"", "").toInt >> ) >> ).toDF() >> bank.registerTempTable("bank") >> >> >> and this is the query: >> >> %sql >> select age, count(1) value >> from bank >> where age < 30 >> group by age >> order by age >> >> >> Any help appreciated. >> >> Thanks >> >> >>