Once you do a .collect, it will bring the data from the worker machines to the master node. And if the dataset is too huge, then the master node will go down.
This will return an array of ((key, 0) *val rdd2 = rdd1.mapValues(v => 0).collect* Thanks Best Regards On Fri, Nov 7, 2014 at 10:41 AM, Deep Pradhan <pradhandeep1...@gmail.com> wrote: > Hi, > > The collect method returns an Array. If I have a huge set of data and I do > something like the following: > > *val rdd2 = rdd1.mapValues(v => 0).collect *//where rdd1 is some > key-value pair RDD > > As per my understanding, this will return an array(String, Int) and if my > data is huge this will return a huge array. > > Will there not be any problems pertaining to memory if I do this? In other > words, for a huge dataset, collect will create a huge array. Can there > arise any memory issues if I use collect? > > Thank You >