Hi Everyone Hope you are doing well
I have the following dataframe. df = spark.createDataFrame( [ [1, 'AB', 12, '2022-01-01'] , [1, 'AA', 22, '2022-01-10'] , [1, 'AC', 11, '2022-01-11'] , [2, 'AB', 22, '2022-02-01'] , [2, 'AA', 28, '2022-02-10'] , [2, 'AC', 25, '2022-02-22'] ] , 'code: int, doc_type: string, amount: int, load_date: string' ) df = df.withColumn('load_date', F.to_date('load_date')) I want to pivot the amount but just want the first value from the date. This is what I tried and it is not giving me the desried results. ( df.groupBy('code') .pivot('doc_type', ['AB', 'AA', 'AC']) .agg(F.sum('amount').alias('amnt'), F.first('load_date').alias('ldt')) .show() ) +----+-------+----------+-------+----------+-------+----------+ |code|AB_amnt| AB_ldt|AA_amnt| AA_ldt|AC_amnt| AC_ldt| +----+-------+----------+-------+----------+-------+----------+ | 1| 12|2022-01-01| 22|2022-01-10| 11|2022-01-11| | 2| 22|2022-02-01| 28|2022-02-10| 25|2022-02-22| +----+-------+----------+-------+----------+-------+----------+ This is what I want. ( df.groupBy('code') .agg( F.sum(F.when(F.col('doc_type') == 'AB', F.col('amount'))).alias('AB_amnt') , F.sum(F.when(F.col('doc_type') == 'AA', F.col('amount'))).alias('AA_amnt') , F.sum(F.when(F.col('doc_type') == 'AC', F.col('amount'))).alias('AC_amnt') , F.first('load_date').alias('load_date') ) .show() ) +----+-------+-------+-------+----------+ |code|AB_amnt|AA_amnt|AC_amnt| load_date| +----+-------+-------+-------+----------+ | 1| 12| 22| 11|2022-01-01| | 2| 22| 28| 25|2022-02-01| +----+-------+-------+-------+----------+ Is there any simpler way to do it? I have more than one column to put into pivot and also to put into non pivot. I am using Databricks 14.3 LTS with Spark 3.5.0 Thanks & Regards Dhruv