Re: Fastest way to drop useless columns

julio . cesare Thu, 31 May 2018 02:25:51 -0700

I believe this only works when we need to drop duplicate ROWS


Here I want to drop cols which contains one unique value


Le 2018-05-31 11:16, Divya Gehlot a écrit :

you can try dropduplicate function

https://github.com/spirom/LearningSpark/blob/master/src/main/scala/dataframe/DropDuplicates.scala

On 31 May 2018 at 16:34, <julio.ces...@free.fr> wrote:

Hi there !

I have a potentially large dataset ( regarding number of rows and
cols )

And I want to find the fastest way to drop some useless cols for me,
i.e. cols containing only an unique value !

I want to know what do you think that I could do to do this as fast
as possible using spark.

I already have a solution using distinct().count() or
approxCountDistinct()
But, they may not be the best choice as this requires to go through
all the data, even if the 2 first tested values for a col are
already different ( and in this case I know that I can keep the col
)

Thx for your ideas !

Julien

---------------------------------------------------------------------

To unsubscribe e-mail: user-unsubscr...@spark.apache.org


---------------------------------------------------------------------
To unsubscribe e-mail: user-unsubscr...@spark.apache.org

Re: Fastest way to drop useless columns

Reply via email to