Calling Pyspark functions in parallel

Debabrata Ghosh Sun, 18 Mar 2018 22:54:48 -0700

Hi,
             My dataframe is having 2000 rows. For processing each row it
consider 3 seconds and so sequentially it takes 2000 * 3 = 6000 seconds ,
which is a very high time.


              Further, I am contemplating to run the function in parallel.
For example, I would like to divide the total rows in my dataframe by 4 and
accordingly I will prepare a set of 500 rows and want to call my pyspark
function in parallel. I wanted to know if there is any library / pyspark
function which I can leverage to do this execution in parallel.

               Will really appreciate for your feedback as per your
earliest convenience. Thanks,

Debu

Calling Pyspark functions in parallel

Reply via email to