As Sean suggested, try out the new sort-based shuffle in 1.1 if you know
you're triggering large shuffles. That should help a lot.

2014년 10월 31일 금요일, Bill Q<bill.q....@gmail.com>님이 작성한 메시지:

> Hi Sean,
> Thanks for the reply. I think both driver and worker have the problem. You
> are right that the ulimit fixed the driver side too many files open error.
>
> And there is a very big shuffle. My maybe naive thought is to migrate the
> HQL scripts directly from Hive to Spark SQL and make them  work. It seems
> that it won't be that easy. Is that correct? And it seems that I had done
> that with Shark and it worked pretty well in the old days.
>
> Any suggestions if we are planning to migrate a large code base from
> Hive to Spark SQL with minimum code rewriting?
>
> Many thanks.
>
>
> Cao
>
> On Friday, October 31, 2014, Sean Owen <so...@cloudera.com
> <javascript:_e(%7B%7D,'cvml','so...@cloudera.com');>> wrote:
>
>> It's almost surely the workers, not the driver (shell) that have too
>> many files open. You can change their ulimit. But it's probably better
>> to see why it happened -- a very big shuffle? -- and repartition or
>> design differently to avoid it. The new sort-based shuffle might help
>> in this regard.
>>
>> On Fri, Oct 31, 2014 at 3:25 PM, Bill Q <bill.q....@gmail.com> wrote:
>> > Hi,
>> > I am trying to make Spark SQL 1.1 to work to replace part of our ETL
>> > processes that are currently done by Hive 0.12.
>> >
>> > A common problem that I have encountered is the "Too many files open"
>> error.
>> > Once that happened, the query just failed. I started the spark-shell by
>> > using "ulimit -n 4096 & spark-shell". And it still pops the same error.
>> >
>> > Any solutions?
>> >
>> > Many thanks.
>> >
>> >
>> > Bill
>> >
>> >
>> >
>> > --
>> > Many thanks.
>> >
>> >
>> > Bill
>> >
>>
>
>
> --
> Many thanks.
>
>
> Bill
>
>

Reply via email to