jornfra...@gmail.com]
Sent: Monday, June 4, 2018 10:59 PM
To: Jain, Neha T. mailto:neha.t.j...@accenture.com>>
Cc: user@spark.apache.org<mailto:user@spark.apache.org>; Patel, Payal
mailto:payal.pa...@accenture.com>>; Sing, Jasbir
mailto:jasbir.s...@accenture.com>>
Subject: Re: [E
I think also there is a misunderstanding how repartition works. It keeps the
existing number of partitions, but hash partitions according to userid. Means
in each partition it is likely to have different user ids.
That would also explain your observed behavior. However without having the full
How do you load the data? How do you write it?
I fear without a full source code it will be difficult to troubleshoot the
issue.
Which Spark version?
Use case is not yet 100% clear to me. You want to set the row with the
oldest/newest date to true? I would just use top or something similar when