As Sean suggested, try out the new sort-based shuffle in 1.1 if you know you're triggering large shuffles. That should help a lot.
2014년 10월 31일 금요일, Bill Q<bill.q....@gmail.com>님이 작성한 메시지: > Hi Sean, > Thanks for the reply. I think both driver and worker have the problem. You > are right that the ulimit fixed the driver side too many files open error. > > And there is a very big shuffle. My maybe naive thought is to migrate the > HQL scripts directly from Hive to Spark SQL and make them work. It seems > that it won't be that easy. Is that correct? And it seems that I had done > that with Shark and it worked pretty well in the old days. > > Any suggestions if we are planning to migrate a large code base from > Hive to Spark SQL with minimum code rewriting? > > Many thanks. > > > Cao > > On Friday, October 31, 2014, Sean Owen <so...@cloudera.com > <javascript:_e(%7B%7D,'cvml','so...@cloudera.com');>> wrote: > >> It's almost surely the workers, not the driver (shell) that have too >> many files open. You can change their ulimit. But it's probably better >> to see why it happened -- a very big shuffle? -- and repartition or >> design differently to avoid it. The new sort-based shuffle might help >> in this regard. >> >> On Fri, Oct 31, 2014 at 3:25 PM, Bill Q <bill.q....@gmail.com> wrote: >> > Hi, >> > I am trying to make Spark SQL 1.1 to work to replace part of our ETL >> > processes that are currently done by Hive 0.12. >> > >> > A common problem that I have encountered is the "Too many files open" >> error. >> > Once that happened, the query just failed. I started the spark-shell by >> > using "ulimit -n 4096 & spark-shell". And it still pops the same error. >> > >> > Any solutions? >> > >> > Many thanks. >> > >> > >> > Bill >> > >> > >> > >> > -- >> > Many thanks. >> > >> > >> > Bill >> > >> > > > -- > Many thanks. > > > Bill > >