Re: [Spark SQL]: Does Union operation followed by drop duplicate follows "keep first"

2019-09-14 Thread Dhaval Patel
Hi Abhinesh, As drop duplicates keeps first record, you can keep some id for 1st and 2nd df and then Union -> sort on that id -> drop duplicates. This will ensure records from 1st df is kept and 2nd are dropped. Regards Dhaval On Sat, Sep 14, 2019 at 4:41 PM Abhinesh Hada wrote: > Hey Nathan,

Re: [Spark SQL]: Does Union operation followed by drop duplicate follows "keep first"

2019-09-14 Thread Abhinesh Hada
Hey Nathan, As the dataset is very huge, I am looking for ways that involve minimum joins. I will give a try to your approach. Thanks a lot for your help. On Sat, Sep 14, 2019 at 12:58 AM Nathan Kronenfeld wrote: > It's a bit of a pain, but you could just use an outer join (assuming there > are