As Sean suggested, try out the new sort-based shuffle in 1.1 if you know
you're triggering large shuffles. That should help a lot.
2014년 10월 31일 금요일, Bill Q님이 작성한 메시지:
> Hi Sean,
> Thanks for the reply. I think both driver and worker have the problem. You
> are right that the ulimit fixed the dri
Hi Sean,
Thanks for the reply. I think both driver and worker have the problem. You
are right that the ulimit fixed the driver side too many files open error.
And there is a very big shuffle. My maybe naive thought is to migrate the
HQL scripts directly from Hive to Spark SQL and make them work.
It's almost surely the workers, not the driver (shell) that have too
many files open. You can change their ulimit. But it's probably better
to see why it happened -- a very big shuffle? -- and repartition or
design differently to avoid it. The new sort-based shuffle might help
in this regard.
On F