from:"Abhinesh Hada"

Re: [Spark SQL]: Does Union operation followed by drop duplicate follows "keep first"

2019-09-14 Thread Abhinesh Hada

ldLeft(ds4) { case (ds, nextColumnName) => > ds.drop(ds1(nextColumnName)).drop(ds2(nextColumnName)) > }.drop("id") > > // And get rid of our new_ marker > val ds6 = allColumns.foldLeft(ds5) { case (ds, nextColumnName) => > ds.withCo

[Spark SQL]: Does Union operation followed by drop duplicate follows "keep first"

2019-09-13 Thread Abhinesh Hada

Hi, I am trying to take union of 2 dataframes and then drop duplicate based on the value of a specific column. But, I want to make sure that while dropping duplicates, the rows from first data frame are kept. Example: df1 = df1.union(df2).dropDuplicates(['id'])