Re: Dataframe fails for large resultsize

Buntu Dev Fri, 29 Apr 2016 18:37:59 -0700

Thanks Krishna, but I believe the memory consumed on the executors is being
exhausted in my case. I've allocated the max 10g that I can to both the
driver and executors. Are there any alternatives solutions to fetching the
top 1M rows after ordering the dataset?


Thanks!

On Fri, Apr 29, 2016 at 6:01 PM, Krishna <research...@gmail.com> wrote:

> I recently encountered similar network related errors and was able to fix
> it by applying the ethtool updates decribed here [
> https://issues.apache.org/jira/plugins/servlet/mobile#issue/SPARK-5085]
>
>
> On Friday, April 29, 2016, Buntu Dev <buntu...@gmail.com> wrote:
>
>> Just to provide more details, I have 200 blocks (parquet files) with avg
>> block size of 70M. When limiting the result set to 100k ("select * from tbl
>> order by c1 limit 100000") works but when increasing it to say 1M I keep
>> running into this error:
>>  Connection reset by peer: socket write error
>>
>> I would ultimately want to store the result set as parquet. Are there any
>> other options to handle this?
>>
>> Thanks!
>>
>> On Wed, Apr 27, 2016 at 11:10 AM, Buntu Dev <buntu...@gmail.com> wrote:
>>
>>> I got 14GB of parquet data and when trying to apply order by using spark
>>> sql and save the first 1M rows but keeps failing with "Connection reset
>>> by peer: socket write error" on the executors.
>>>
>>> I've allocated about 10g to both driver and the executors along with
>>> setting the maxResultSize to 10g but still fails with the same error.
>>> I'm using Spark 1.5.1.
>>>
>>> Are there any other alternative ways to handle this?
>>>
>>> Thanks!
>>>
>>
>>

Re: Dataframe fails for large resultsize

Reply via email to