I have an rdd which contains 14 different columns. I need to find the distinct across all the columns of rdd and write it to hdfs.
How can I acheive this ? Is there any distributed data structure that I can use and keep on updating it as I traverse the new rows ? Regards, Abhi