Thanks Krishna, but I believe the memory consumed on the executors is being exhausted in my case. I've allocated the max 10g that I can to both the driver and executors. Are there any alternatives solutions to fetching the top 1M rows after ordering the dataset?
Thanks! On Fri, Apr 29, 2016 at 6:01 PM, Krishna <research...@gmail.com> wrote: > I recently encountered similar network related errors and was able to fix > it by applying the ethtool updates decribed here [ > https://issues.apache.org/jira/plugins/servlet/mobile#issue/SPARK-5085] > > > On Friday, April 29, 2016, Buntu Dev <buntu...@gmail.com> wrote: > >> Just to provide more details, I have 200 blocks (parquet files) with avg >> block size of 70M. When limiting the result set to 100k ("select * from tbl >> order by c1 limit 100000") works but when increasing it to say 1M I keep >> running into this error: >> Connection reset by peer: socket write error >> >> I would ultimately want to store the result set as parquet. Are there any >> other options to handle this? >> >> Thanks! >> >> On Wed, Apr 27, 2016 at 11:10 AM, Buntu Dev <buntu...@gmail.com> wrote: >> >>> I got 14GB of parquet data and when trying to apply order by using spark >>> sql and save the first 1M rows but keeps failing with "Connection reset >>> by peer: socket write error" on the executors. >>> >>> I've allocated about 10g to both driver and the executors along with >>> setting the maxResultSize to 10g but still fails with the same error. >>> I'm using Spark 1.5.1. >>> >>> Are there any other alternative ways to handle this? >>> >>> Thanks! >>> >> >>