Thanks for your reply. The default max open files was 4096 and it seems
it's not enough for Spark. I followed this instruction (
https://easyengine.io/tutorials/linux/increase-open-files-limit/) and
increase it a large value. it works perfectly now. Thanks.



On Tue, Dec 22, 2015 at 10:01 PM, Alexander Bezzubov <b...@apache.org> wrote:

> Hi,
>
> wellcome to the Zeppelin community!
>
> It looks like you are doing everything right but have some
> platform-specific issue, that Spark is hitting the limit of open files on
> your OS.
>
> This should not happen, so could you please check what is the current open
> file limit on your environment/OS and (just in case) cross-check
> spark-secific mailing list, in case that is some kind of known issues.
>
> --
> Alex
>
> On Wed, Dec 23, 2015, 10:07 Amirhossein Aleyasin <amir.8...@gmail.com>
> wrote:
>
>> Hello,
>> I am new to zeppelin, I just installed it and tried to run the tutorial
>> example.
>> The "load data into Table" part works perfect, but when I wanted to
>> submit the sample queries, it throws the following exception:
>>
>>
>> java.io.FileNotFoundException:
>> /tmp/blockmgr-5d2c5999-5593-4f83-9d6d-3c290523ce29/3f/temp_shuffle_102ac16f-b5c6-4cc4-9c8e-b6bc66f17eb5
>> (Too many open files) at java.io.FileOutputStream.open(Native Method) at
>> java.io.FileOutputStream.<init>(FileOutputStream.java:221) at
>> org.apache.spark.storage.DiskBlockObjectWriter.open(DiskBlockObjectWriter.scala:88)
>> at
>> org.apache.spark.shuffle.sort.BypassMergeSortShuffleWriter.insertAll(BypassMergeSortShuffleWriter.java:110)
>> at
>> org.apache.spark.shuffle.sort.SortShuffleWriter.write(SortShuffleWriter.scala:73)
>> at
>> org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:73)
>> at
>> org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:41)
>> at org.apache.spark.scheduler.Task.run(Task.scala:88) at
>> org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:214) at
>> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
>> at
>> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
>> at java.lang.Thread.run(Thread.java:745)
>>
>>
>> This is the load table code:
>>
>> import org.apache.commons.io.IOUtils
>> import java.net.URL
>> import java.nio.charset.Charset
>>
>> // Zeppelin creates and injects sc (SparkContext) and sqlContext
>> (HiveContext or SqlContext)
>> // So you don't need create them manually
>>
>> // load bank data
>> val bankText = sc.parallelize(
>>     IOUtils.toString(
>>         new URL("
>> https://s3.amazonaws.com/apache-zeppelin/tutorial/bank/bank.csv";),
>>         Charset.forName("utf8")).split("\n"))
>>
>> case class Bank(age: Integer, job: String, marital: String, education:
>> String, balance: Integer)
>>
>> val bank = bankText.map(s => s.split(";")).filter(s => s(0) !=
>> "\"age\"").map(
>>     s => Bank(s(0).toInt,
>>             s(1).replaceAll("\"", ""),
>>             s(2).replaceAll("\"", ""),
>>             s(3).replaceAll("\"", ""),
>>             s(5).replaceAll("\"", "").toInt
>>         )
>> ).toDF()
>> bank.registerTempTable("bank")
>>
>>
>> and this is the query:
>>
>> %sql
>> select age, count(1) value
>> from bank
>> where age < 30
>> group by age
>> order by age
>>
>>
>> Any help appreciated.
>>
>> Thanks
>>
>>
>>

Reply via email to