Hi Team,
I am trying to understand how to estimate Kube cpu with respect to Spark
executor cores.
For example,
Job configuration: (given to start)
cores/executor = 4
# of executors = 240
But the allocated resources when we ran job are as follows,
cores/executor = 4
# of executors = 47
So the q
Hi All,
I was wondering if we have any best practices on using pandas UDF ?
Profiling UDF is not an easy task and our case requires some drilling down
on the logic of the function.
Our use case:
We are using func(Dataframe) => Dataframe as interface to use Pandas UDF,
while running locally only
ing something expensive in each UDF call and consider amortizing it with
>>>> the scalar iterator UDF pattern. Maybe.
>>>>
>>>> A pandas UDF is not spark code itself so no there is no tool in spark
>>>> to profile it. Conversely any approach to p
Hi Team,
I am working on a basic streaming aggregation where I have one file stream
source and two write sinks (Hudi table). The only difference is the
aggregation performed is different, hence I am using the same spark
session to perform both operations.
(File Source)
--> Agg1 -> DF1
-->