Re: prefix column Spark

Michael Armbrust Tue, 19 Apr 2016 10:04:55 -0700

A few comments:
 - Each withColumnRename is adding a new level to the logical plan.  We
have optimized this significantly in newer versions of Spark, but it is
still not free.
 - Transforming to an RDD is going to do fairly expensive conversion back
and forth between the internal binary format.
 - Probably the best way to accomplish this is to build up all the new
columns you want and pass them to a single select call.



On Tue, Apr 19, 2016 at 3:04 AM, nihed mbarek <nihe...@gmail.com> wrote:

> Hi
> thank you, it's the first solution and it took a long time to manage all
> my fields
>
> Regards,
>
> On Tue, Apr 19, 2016 at 11:29 AM, Ndjido Ardo BAR <ndj...@gmail.com>
> wrote:
>
>>
>> This can help:
>>
>> import org.apache.spark.sql.DataFrame
>>
>> def prefixDf(dataFrame: DataFrame, prefix: String): DataFrame = {
>>   val colNames = dataFrame.columns
>>   colNames.foldLeft(dataFrame){
>>         (df, colName) => {
>>           df.withColumnRenamed(colName, s"${prefix}_${colName}")
>>         }
>>     }
>> }
>>
>> cheers,
>> Ardo
>>
>>
>> On Tue, Apr 19, 2016 at 10:53 AM, nihed mbarek <nihe...@gmail.com> wrote:
>>
>>> Hi,
>>>
>>> I want to prefix a set of dataframes and I try two solutions:
>>> * A for loop calling withColumnRename based on columns()
>>> * transforming my Dataframe to and RDD, updating the old schema and
>>> recreating the dataframe.
>>>
>>>
>>> both are working for me, the second one is faster with tables that
>>> contain 800 columns but have a more stage of transformation toRDD.
>>>
>>> Is there any other solution?
>>>
>>> Thank you
>>>
>>> --
>>>
>>> M'BAREK Med Nihed,
>>> Fedora Ambassador, TUNISIA, Northern Africa
>>> http://www.nihed.com
>>>
>>> <http://tn.linkedin.com/in/nihed>
>>>
>>>
>>
>
>
> --
>
> M'BAREK Med Nihed,
> Fedora Ambassador, TUNISIA, Northern Africa
> http://www.nihed.com
>
> <http://tn.linkedin.com/in/nihed>
>
>

Re: prefix column Spark

Reply via email to