Hi Abdeali,
Thanks for the support. Indeed you can go ahead and test and review my latest
PR for SPARK-34265
(Instrument Python UDF execution using SQL Metrics) if you want to:
https://github.com/apache/spark/pull/33559
Currently I reduced the scope of the instrumentation to just 3 si
Hello Members,
I have a query for spark stages:-
why every stage has a different number of tasks/partitions in spark. Or how
is it determined?
Moreover, where can i see the improvements done in spark3+
Thanks in advance
Rajat
Hi,
I do send back those metrics in as columns in the pandas datagrams in case
required, but the true thing is that we need to finally be able to find out
the time for java object conversion along with the udf calls and actual
python memory and other details which we can all do by tweaking udf.
B
Hello Rajat,
Look up the spark *Pipelining* concept; any sequence of operations that
feed data directly into each other without need for shuffling will packed
into a single stage, ie select -> filter -> select (SparkSQL) ; map ->
filter -> map (RDD), for any operation that requires shuffling (sort