After adding the sequential ids you might need a repartition? I've found
using monotically increasing id before that the df goes to a single
partition. Usually becomes clear in the spark ui though
On Tue, 6 Oct 2020, 20:38 Sachit Murarka, wrote:
> Yes, Even I tried the same first. Then I moved t
t; When you say "do the count as the final step", are you referring to
> getting the counts of the individual data frames, or from the already
> outputted parquet?
>
> Thanks and I appreciate your reply
>
> On Thu, Feb 13, 2020 at 4:15 PM David Edwards
> wrote:
>
>
Hi Ashley,
I'm not an expert but think this is because spark does lazy execution and
doesn't actually perform any actions until you do some kind of write, count
or other operation on the dataframe.
If you remove the count steps it will work out a more efficient execution
plan reducing the number