Re: Coalesce vs reduce operation parameter

2021-03-20 Thread Attila Zsolt Piros
Hi! Actually *coalesce()* is usually a cheap operation as it moves some existing partitions from one node to another. So it is not a (full) shuffle. See the documentation

Re: Coalesce vs reduce operation parameter

2021-03-20 Thread Attila Zsolt Piros
Hi! Actually *coalesce()* is usually a cheap operation as it moves some existing partitions from one node to another. So it is not a (full) shuffle. See the documentation coalesce is a cheap operation as it moves some existing partitions from one node to another. So it is not a full shuffle. See

Re: Coalesce vs reduce operation parameter

2021-03-20 Thread vaquar khan
HI Pedro, What is your usecase ,why you used coqlesce ,coalesce() is very expensive operations as they shuffle the data across many partitions hence try to minimize repartition as much as possible. Regards, Vaquar khan On Thu, Mar 18, 2021, 5:47 PM Pedro Tuero wrote: > I was reviewing a spark