Hi,
I am experiencing performance issues in one of my pyspark applications. When I
look at the spark UI, the file and line number of each entry is listed as
. I would like to use the information in the Spark UI for debugging,
but without knowing the correct file and line number for the informat
the encoder api remains a pain point due to its lack of composability.
serialization overhead is also still there i believe. i dont remember what
has happened to the predicate pushdown issues, i think they are mostly
resolved?
we tend to use dataset api on our methods/interfaces where its fitting b
Hi Amit,
The only approach I can think of is to create 2 copies of schema_df1, one
partitioned on key1 and other on key2 and then use these to Join.
From: Amit Joshi
Sent: 04 October 2021 19:13
To: spark-user
Subject: [EXTERNAL] [Marketing Mail] Re: [Spark] Opti