Hi Gourav, Since Koalas needs the same round-trip to/from JVM and Python, I expect that the performance should be nearly the same for UDFs in either API
Cheers Andrew On Thu, Aug 25, 2022 at 11:22 AM Gourav Sengupta <gourav.sengu...@gmail.com> wrote: > > Hi, > > May be I am jumping to conclusions and making stupid guesses, but have you > tried koalas now that it is natively integrated with pyspark?? > > Regards > Gourav > > On Thu, 25 Aug 2022, 11:07 Subash Prabanantham, <subashpraba...@gmail.com> > wrote: >> >> Hi All, >> >> I was wondering if we have any best practices on using pandas UDF ? >> Profiling UDF is not an easy task and our case requires some drilling down >> on the logic of the function. >> >> >> Our use case: >> We are using func(Dataframe) => Dataframe as interface to use Pandas UDF, >> while running locally only the function, it runs faster but when executed in >> Spark environment - the processing time is more than expected. We have one >> column where the value is large (BinaryType -> 600KB), wondering whether >> this could make the Arrow computation slower ? >> >> Is there any profiling or best way to debug the cost incurred using pandas >> UDF ? >> >> >> Thanks, >> Subash >> --------------------------------------------------------------------- To unsubscribe e-mail: user-unsubscr...@spark.apache.org