Something like this
use listcomprihension
doc_types = ["AB", "AA", "AC"]
result = df.groupBy("code").agg(
*[F.sum(F.when(F.col("doc_type") == dt,
F.col("amount"))).alias(f"{dt}_amnt")
for dt in doc_types],
F.first("load_date").alias("load_date")
)
and it dont use pivot for it.
søn
Well I tried using windowing functions with pivot() and it did not work.
>From your reply, you are looking for a function that would ideally combine
the conciseness of pivot() with the flexibility of explicit aggregations.
While Spark provides powerful tools, there is not a single built-in
function
Yes, this is it. I want to form this using a simple short command. The way
I mentioned is a lengthy one.
On Sun, Mar 9, 2025 at 10:16 PM Mich Talebzadeh
wrote:
> Is this what you are expecting?
>
> root
> |-- code: integer (nullable = true)
> |-- AB_amnt: long (nullable = true)
> |-- AA_amnt:
Hey, I already know this and have written the same in my question. I know
formatting can make the code a lot simpler and easier to understand, but
I'm looking if there is already a function or a spark built-in for this.
Thanks for the help though.
On Sun, Mar 9, 2025 at 11:42 PM Mich Talebzadeh
w
import pyspark
from pyspark import SparkConf, SparkContext
from pyspark.sql import SparkSession
from pyspark.sql import SQLContext
from pyspark.sql.functions import struct
from pyspark.sql import functions as F
from pyspark.sql.types import StructType, StructField, IntegerType,
StringType, DateType
Is this what you are expecting?
root
|-- code: integer (nullable = true)
|-- AB_amnt: long (nullable = true)
|-- AA_amnt: long (nullable = true)
|-- AC_amnt: long (nullable = true)
|-- load_date: date (nullable = true)
++---+---+---+--+
|code|AB_amnt|AA_amnt|AC_amnt|l
Hi Everyone
Hope you are doing well
I have the following dataframe.
df = spark.createDataFrame(
[
[1, 'AB', 12, '2022-01-01']
, [1, 'AA', 22, '2022-01-10']
, [1, 'AC', 11, '2022-01-11']
, [2, 'AB', 22, '2022-02-01']
, [2, 'AA', 28, '2022-02-10']