You can use distinct over you data frame or rdd rdd.distinct
It will give you distinct across your row. On Mon, Sep 19, 2016 at 2:35 PM, Abhishek Anand <abhis.anan...@gmail.com> wrote: > I have an rdd which contains 14 different columns. I need to find the > distinct across all the columns of rdd and write it to hdfs. > > How can I acheive this ? > > Is there any distributed data structure that I can use and keep on > updating it as I traverse the new rows ? > > Regards, > Abhi > -- Thanks and Regards, Saurav Sinha Contact: 9742879062