Maybe something like

 

var finalDF = spark.sqlContext.emptyDataFrame

for (df <- dfs){

    finalDF = finalDF.union(df)

}

 

 

Where dfs is a Seq of dataframes.

 

From: Cesar <ces...@gmail.com>
Date: Thursday, April 5, 2018 at 2:17 PM
To: user <user@spark.apache.org>
Subject: Union of multiple data frames

 

 

The following code works for small n, but not for large n (>20):

 

val dfUnion = Seq(df1,df2,df3,...dfn).reduce(_ union _)

dfUnion.show()

 

By not working, I mean that Spark takes a lot of time to create the execution 
plan.

 

Is there a more optimal way to perform a union of multiple data frames?


 

thanks

-- 

Cesar Flores

Reply via email to