A few comments:
- Each withColumnRename is adding a new level to the logical plan. We
have optimized this significantly in newer versions of Spark, but it is
still not free.
- Transforming to an RDD is going to do fairly expensive conversion back
and forth between the internal binary format.
-
Hi
thank you, it's the first solution and it took a long time to manage all my
fields
Regards,
On Tue, Apr 19, 2016 at 11:29 AM, Ndjido Ardo BAR wrote:
>
> This can help:
>
> import org.apache.spark.sql.DataFrame
>
> def prefixDf(dataFrame: DataFrame, prefix: String): DataFrame = {
> val colN
This can help:
import org.apache.spark.sql.DataFrame
def prefixDf(dataFrame: DataFrame, prefix: String): DataFrame = {
val colNames = dataFrame.columns
colNames.foldLeft(dataFrame){
(df, colName) => {
df.withColumnRenamed(colName, s"${prefix}_${colName}")
}
}
}
Hi,
I want to prefix a set of dataframes and I try two solutions:
* A for loop calling withColumnRename based on columns()
* transforming my Dataframe to and RDD, updating the old schema and
recreating the dataframe.
both are working for me, the second one is faster with tables that contain
800