Handle empty partitions in pyspark

kanchan tewary Tue, 23 Apr 2019 23:32:22 -0700

Hi All,

I have a situation where the rdd is having some empty partitions, which I
would like to identify and handle while applying mapPartitions or similar
functions. Is there a way to do this in pyspark? The method isEmpty works
on the rdd only and can not be applied.


Much appreciated.

Code block:

`list1 = [1,2,3,3,6,7,8,12,6,23,45,76,9,10]

r1 = sc.parallelize(list1,20)

def adder(iterator):
    if iterator.isEmpty():
        yield 'None'
    else:
        yield sum(iterator)

print(r1.mapPartitions(adder).collect())`

Thanks & Best Regards,
Kanchan
Data Engineer, IBM

Handle empty partitions in pyspark

Reply via email to