Re: Poor performance caused by coalesce to 1

Mich Talebzadeh Wed, 03 Feb 2021 11:26:55 -0800

That sounds like a plan as suggested by Sean, I have also seen caching the
RS before coalesce provides benefits, especially for a minute 50MB data.
Check Spark GUI storage tab for its effect.


HTH


Mich


LinkedIn * 
https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
<https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw>*





*Disclaimer:* Use it at your own risk. Any and all responsibility for any
loss, damage or destruction of data or any other property which may arise
from relying on this email's technical content is explicitly disclaimed.
The author will in no case be liable for any monetary damages arising from
such loss, damage or destruction.




On Wed, 3 Feb 2021 at 19:08, Sean Owen <sro...@gmail.com> wrote:

> Probably could also be because that coalesce can cause some upstream
> transformations to also have parallelism of 1. I think (?) an OK solution
> is to cache the result, then coalesce and write. Or combine the files after
> the fact. or do what Silvio said.
>
> On Wed, Feb 3, 2021 at 12:55 PM James Yu <ja...@ispot.tv> wrote:
>
>> Hi Team,
>>
>> We are running into this poor performance issue and seeking your
>> suggestion on how to improve it:
>>
>> We have a particular dataset which we aggregate from other datasets and
>> like to write out to one single file (because it is small enough).  We
>> found that after a series of transformations (GROUP BYs, FLATMAPs), we
>> coalesced the final RDD to 1 partition before writing it out, and this
>> coalesce degrade the performance, not that this additional coalesce
>> operation took additional runtime, but it somehow dictates the partitions
>> to use in the upstream transformations.
>>
>> We hope there is a simple and useful way to solve this kind of issue
>> which we believe is quite common for many people.
>>
>>
>> Thanks
>>
>> James
>>
>

Re: Poor performance caused by coalesce to 1

Reply via email to