Do you have the logs of the containers? This seems like a Memory issue.
2016-11-10 7:28 GMT+00:00 lk_spark :
> hi,all:
> when I call api df.write.parquet ,there is alot of small files : how
> can I merge then into on file ? I tried df.coalesce(1).write.parquet ,but
> it will get error some
: lk_spark
Cc: user.spark
Subject: RE: Re:RE: how to merge dataframe write output files
Your coalesce should technically work - One thing to check would be overhead
memory. You should configure it as 10% of executor memory. Also, you might
need to increase maxResultSize. Also, the data looks fine
y I don't know the answer to this, but pretty sure there should be a
way to work with fragmented files too.
From: lk_spark [mailto:lk_sp...@163.com]
Sent: Thursday, November 10, 2016 12:20 AM
To: Shreya Agarwal
Cc: user.spark
Subject: Re:RE: how to merge dataframe write output files
thank
for JRE or any other runtime to load in memory on a single box.
From: lk_spark [mailto:lk_sp...@163.com]
Sent: Wednesday, November 9, 2016 11:29 PM
To: user.spark
Subject: how to merge dataframe write output files
hi,all:
when I call api df.write.parquet ,there is alot of small files : how can
merge dataframe write output files
hi,all:
when I call api df.write.parquet ,there is alot of small files : how can
I merge then into on file ? I tried df.coalesce(1).write.parquet ,but it will
get error some times
Container exited with a non-zero exit code 143
more an more...
-rw-r--r
hi,all:
when I call api df.write.parquet ,there is alot of small files : how can
I merge then into on file ? I tried df.coalesce(1).write.parquet ,but it will
get error some times
Container exited with a non-zero exit code 143
more an more...
-rw-r--r-- 2 hadoop supergroup 14.5 K 20