Re: Apply pivot only on some columns in pyspark

2025-03-09 Thread Bjørn Jørgensen
Something like this use listcomprihension doc_types = ["AB", "AA", "AC"] result = df.groupBy("code").agg( *[F.sum(F.when(F.col("doc_type") == dt, F.col("amount"))).alias(f"{dt}_amnt") for dt in doc_types], F.first("load_date").alias("load_date") ) and it dont use pivot for it. søn

Re: Apply pivot only on some columns in pyspark

2025-03-09 Thread Mich Talebzadeh
Well I tried using windowing functions with pivot() and it did not work. >From your reply, you are looking for a function that would ideally combine the conciseness of pivot() with the flexibility of explicit aggregations. While Spark provides powerful tools, there is not a single built-in function

Re: Apply pivot only on some columns in pyspark

2025-03-09 Thread Dhruv Singla
Yes, this is it. I want to form this using a simple short command. The way I mentioned is a lengthy one. On Sun, Mar 9, 2025 at 10:16 PM Mich Talebzadeh wrote: > Is this what you are expecting? > > root > |-- code: integer (nullable = true) > |-- AB_amnt: long (nullable = true) > |-- AA_amnt:

Re: Apply pivot only on some columns in pyspark

2025-03-09 Thread Dhruv Singla
Hey, I already know this and have written the same in my question. I know formatting can make the code a lot simpler and easier to understand, but I'm looking if there is already a function or a spark built-in for this. Thanks for the help though. On Sun, Mar 9, 2025 at 11:42 PM Mich Talebzadeh w

Re: Apply pivot only on some columns in pyspark

2025-03-09 Thread Mich Talebzadeh
import pyspark from pyspark import SparkConf, SparkContext from pyspark.sql import SparkSession from pyspark.sql import SQLContext from pyspark.sql.functions import struct from pyspark.sql import functions as F from pyspark.sql.types import StructType, StructField, IntegerType, StringType, DateType

Re: Apply pivot only on some columns in pyspark

2025-03-09 Thread Mich Talebzadeh
Is this what you are expecting? root |-- code: integer (nullable = true) |-- AB_amnt: long (nullable = true) |-- AA_amnt: long (nullable = true) |-- AC_amnt: long (nullable = true) |-- load_date: date (nullable = true) ++---+---+---+--+ |code|AB_amnt|AA_amnt|AC_amnt|l

Apply pivot only on some columns in pyspark

2025-03-09 Thread Dhruv Singla
Hi Everyone Hope you are doing well I have the following dataframe. df = spark.createDataFrame( [ [1, 'AB', 12, '2022-01-01'] , [1, 'AA', 22, '2022-01-10'] , [1, 'AC', 11, '2022-01-11'] , [2, 'AB', 22, '2022-02-01'] , [2, 'AA', 28, '2022-02-10']