Thanks for your answers. The suggested method works when the number of Data Frames is small.
However, I am trying to union >30 Data Frames, and the time to create the plan is taking longer than the execution, which should not be the case. Thanks! -- Cesar On Thu, Apr 5, 2018 at 1:29 PM, Andy Davidson <a...@santacruzintegration.com > wrote: > > Hi Ceasar > > I have used Brandson approach in the past with out any problem > > Andy > From: Brandon Geise <brandonge...@gmail.com> > Date: Thursday, April 5, 2018 at 11:23 AM > To: Cesar <ces...@gmail.com>, "user @spark" <user@spark.apache.org> > Subject: Re: Union of multiple data frames > > Maybe something like > > > > var finalDF = spark.sqlContext.emptyDataFrame > > for (df <- dfs){ > > finalDF = finalDF.union(df) > > } > > > > > > Where dfs is a Seq of dataframes. > > > > *From: *Cesar <ces...@gmail.com> > *Date: *Thursday, April 5, 2018 at 2:17 PM > *To: *user <user@spark.apache.org> > *Subject: *Union of multiple data frames > > > > > > The following code works for small n, but not for large n (>20): > > > > val dfUnion = Seq(df1,df2,df3,...dfn).reduce(_ union _) > > dfUnion.show() > > > > By not working, I mean that Spark takes a lot of time to create the > execution plan. > > > > *Is there a more optimal way to perform a union of multiple data frames?* > > > > > thanks > > -- > > Cesar Flores > > -- Cesar Flores