Hi,

The collect method returns an Array. If I have a huge set of data and I do
something like the following:

*val rdd2 = rdd1.mapValues(v => 0).collect *//where rdd1 is some key-value
pair RDD

As per my understanding, this will return an array(String, Int) and if my
data is huge this will return a huge array.

Will there not be any problems pertaining to memory if I do this? In other
words, for a huge dataset, collect will create a huge array. Can there
arise any memory issues if I use collect?

Thank You

Reply via email to