subject:"Re\: Too many files open with Spark 1.1 and CDH 5.1"

Re: Too many files open with Spark 1.1 and CDH 5.1

2014-10-31 Thread Nicholas Chammas

As Sean suggested, try out the new sort-based shuffle in 1.1 if you know you're triggering large shuffles. That should help a lot. 2014년 10월 31일 금요일, Bill Q님이 작성한 메시지: > Hi Sean, > Thanks for the reply. I think both driver and worker have the problem. You > are right that the ulimit fixed the dri

Re: Too many files open with Spark 1.1 and CDH 5.1

2014-10-31 Thread Bill Q

Hi Sean, Thanks for the reply. I think both driver and worker have the problem. You are right that the ulimit fixed the driver side too many files open error. And there is a very big shuffle. My maybe naive thought is to migrate the HQL scripts directly from Hive to Spark SQL and make them work.

Re: Too many files open with Spark 1.1 and CDH 5.1

2014-10-31 Thread Sean Owen

It's almost surely the workers, not the driver (shell) that have too many files open. You can change their ulimit. But it's probably better to see why it happened -- a very big shuffle? -- and repartition or design differently to avoid it. The new sort-based shuffle might help in this regard. On F