A few comments: - Each withColumnRename is adding a new level to the logical plan. We have optimized this significantly in newer versions of Spark, but it is still not free. - Transforming to an RDD is going to do fairly expensive conversion back and forth between the internal binary format. - Probably the best way to accomplish this is to build up all the new columns you want and pass them to a single select call.
On Tue, Apr 19, 2016 at 3:04 AM, nihed mbarek <nihe...@gmail.com> wrote: > Hi > thank you, it's the first solution and it took a long time to manage all > my fields > > Regards, > > On Tue, Apr 19, 2016 at 11:29 AM, Ndjido Ardo BAR <ndj...@gmail.com> > wrote: > >> >> This can help: >> >> import org.apache.spark.sql.DataFrame >> >> def prefixDf(dataFrame: DataFrame, prefix: String): DataFrame = { >> val colNames = dataFrame.columns >> colNames.foldLeft(dataFrame){ >> (df, colName) => { >> df.withColumnRenamed(colName, s"${prefix}_${colName}") >> } >> } >> } >> >> cheers, >> Ardo >> >> >> On Tue, Apr 19, 2016 at 10:53 AM, nihed mbarek <nihe...@gmail.com> wrote: >> >>> Hi, >>> >>> I want to prefix a set of dataframes and I try two solutions: >>> * A for loop calling withColumnRename based on columns() >>> * transforming my Dataframe to and RDD, updating the old schema and >>> recreating the dataframe. >>> >>> >>> both are working for me, the second one is faster with tables that >>> contain 800 columns but have a more stage of transformation toRDD. >>> >>> Is there any other solution? >>> >>> Thank you >>> >>> -- >>> >>> M'BAREK Med Nihed, >>> Fedora Ambassador, TUNISIA, Northern Africa >>> http://www.nihed.com >>> >>> <http://tn.linkedin.com/in/nihed> >>> >>> >> > > > -- > > M'BAREK Med Nihed, > Fedora Ambassador, TUNISIA, Northern Africa > http://www.nihed.com > > <http://tn.linkedin.com/in/nihed> > >