Re: Union of multiple data frames

2018-04-06 Thread Alessandro Solimando
5, 2018 at 1:29 PM, Andy Davidson < > a...@santacruzintegration.com> wrote: > >> >> Hi Ceasar >> >> I have used Brandson approach in the past with out any problem >> >> Andy >> From: Brandon Geise >> Date: Thursday, April 5, 2018 at 11:23 AM >

Re: Union of multiple data frames

2018-04-05 Thread Cesar
ndy Davidson wrote: > > Hi Ceasar > > I have used Brandson approach in the past with out any problem > > Andy > From: Brandon Geise > Date: Thursday, April 5, 2018 at 11:23 AM > To: Cesar , "user @spark" > Subject: Re: Union of multiple data frames &

Re: Union of multiple data frames

2018-04-05 Thread Andy Davidson
Hi Ceasar I have used Brandson approach in the past with out any problem Andy From: Brandon Geise Date: Thursday, April 5, 2018 at 11:23 AM To: Cesar , "user @spark" Subject: Re: Union of multiple data frames > Maybe something like > > var finalDF = spark.sqlCon

Re: Union of multiple data frames

2018-04-05 Thread Brandon Geise
Maybe something like var finalDF = spark.sqlContext.emptyDataFrame for (df <- dfs){     finalDF = finalDF.union(df) } Where dfs is a Seq of dataframes. From: Cesar Date: Thursday, April 5, 2018 at 2:17 PM To: user Subject: Union of multiple data frames The following c

Union of multiple data frames

2018-04-05 Thread Cesar
The following code works for small n, but not for large n (>20): val dfUnion = Seq(df1,df2,df3,...dfn).reduce(_ union _) dfUnion.show() By not working, I mean that Spark takes a lot of time to create the execution plan. *Is there a more optimal way to perform a union of multiple data fra