You can use distinct over you data frame or rdd

rdd.distinct

It will give you distinct across your row.

On Mon, Sep 19, 2016 at 2:35 PM, Abhishek Anand <abhis.anan...@gmail.com>
wrote:

> I have an rdd which contains 14 different columns. I need to find the
> distinct across all the columns of rdd and write it to hdfs.
>
> How can I acheive this ?
>
> Is there any distributed data structure that I can use and keep on
> updating it as I traverse the new rows ?
>
> Regards,
> Abhi
>



-- 
Thanks and Regards,

Saurav Sinha

Contact: 9742879062

Reply via email to