Re: prefix column Spark

2016-04-19 Thread Michael Armbrust
A few comments: - Each withColumnRename is adding a new level to the logical plan. We have optimized this significantly in newer versions of Spark, but it is still not free. - Transforming to an RDD is going to do fairly expensive conversion back and forth between the internal binary format. -

Re: prefix column Spark

2016-04-19 Thread nihed mbarek
Hi thank you, it's the first solution and it took a long time to manage all my fields Regards, On Tue, Apr 19, 2016 at 11:29 AM, Ndjido Ardo BAR wrote: > > This can help: > > import org.apache.spark.sql.DataFrame > > def prefixDf(dataFrame: DataFrame, prefix: String): DataFrame = { > val colN

Re: prefix column Spark

2016-04-19 Thread Ndjido Ardo BAR
This can help: import org.apache.spark.sql.DataFrame def prefixDf(dataFrame: DataFrame, prefix: String): DataFrame = { val colNames = dataFrame.columns colNames.foldLeft(dataFrame){ (df, colName) => { df.withColumnRenamed(colName, s"${prefix}_${colName}") } } }

prefix column Spark

2016-04-19 Thread nihed mbarek
Hi, I want to prefix a set of dataframes and I try two solutions: * A for loop calling withColumnRename based on columns() * transforming my Dataframe to and RDD, updating the old schema and recreating the dataframe. both are working for me, the second one is faster with tables that contain 800