Re: how to merge dataframe write output files

2016-11-10 Thread Jorge Sánchez
Do you have the logs of the containers? This seems like a Memory issue. 2016-11-10 7:28 GMT+00:00 lk_spark : > hi,all: > when I call api df.write.parquet ,there is alot of small files : how > can I merge then into on file ? I tried df.coalesce(1).write.parquet ,but > it will get error some

RE: Re:RE: how to merge dataframe write output files

2016-11-10 Thread Mendelson, Assaf
: lk_spark Cc: user.spark Subject: RE: Re:RE: how to merge dataframe write output files Your coalesce should technically work - One thing to check would be overhead memory. You should configure it as 10% of executor memory. Also, you might need to increase maxResultSize. Also, the data looks fine

RE: Re:RE: how to merge dataframe write output files

2016-11-10 Thread Shreya Agarwal
y I don't know the answer to this, but pretty sure there should be a way to work with fragmented files too. From: lk_spark [mailto:lk_sp...@163.com] Sent: Thursday, November 10, 2016 12:20 AM To: Shreya Agarwal Cc: user.spark Subject: Re:RE: how to merge dataframe write output files thank

Re:RE: how to merge dataframe write output files

2016-11-10 Thread lk_spark
for JRE or any other runtime to load in memory on a single box. From: lk_spark [mailto:lk_sp...@163.com] Sent: Wednesday, November 9, 2016 11:29 PM To: user.spark Subject: how to merge dataframe write output files hi,all: when I call api df.write.parquet ,there is alot of small files : how can

RE: how to merge dataframe write output files

2016-11-09 Thread Shreya Agarwal
merge dataframe write output files hi,all: when I call api df.write.parquet ,there is alot of small files : how can I merge then into on file ? I tried df.coalesce(1).write.parquet ,but it will get error some times Container exited with a non-zero exit code 143 more an more... -rw-r--r

how to merge dataframe write output files

2016-11-09 Thread lk_spark
hi,all: when I call api df.write.parquet ,there is alot of small files : how can I merge then into on file ? I tried df.coalesce(1).write.parquet ,but it will get error some times Container exited with a non-zero exit code 143 more an more... -rw-r--r-- 2 hadoop supergroup 14.5 K 20