I don't have personal experience with Koalas but it does seem to have the same api: https://koalas.readthedocs.io/en/latest/reference/api/databricks.koalas.DataFrame.dropna.html
On Fri, Feb 26, 2021 at 11:46 PM Vitali Lupusor <vitalilupu...@gmail.com> wrote: > Hello Chetan, > > I don’t know about Scala, but in PySpark there is no elegant way of > dropping NAs on column axis. > > Here is a possible solution to your problem: > > >>> data = [(None, 1, 2), (0, None, 2), (0, 1, 2)] > >>> columns = ('A', 'B', 'C') > >>> data = [(None, 1, 2), (0, None, 2), (0, 1, 2)] > >>> df = spark.createDataFrame(data, columns) > >>> df.show() > +----+----+---+ > | A| B| C| > +----+----+---+ > |null| 1| 2| > | 0|null| 2| > | 0| 1| 2| > +----+----+---+ > >>> for column in df.columns: > if df.select(column).where(df[column].isNull()).first(): > df = df.drop(column) > ... > >>> df.show() > +---+ > | C| > +---+ > | 2| > | 2| > | 2| > +—+ > > If your dataframe doesn’t exceed the size of your memory, I suggest you > bring it into Pandas. > > >>> df_pd = df.toPandas() > >>> df_pd > A B C > 0 NaN 1.0 2 > 1 0.0 NaN 2 > 2 0.0 1.0 2 > >>> df_pd = df_pd.dropna(axis='column’) > >>> df_pd > C > 0 2 > 1 2 > 2 2 > > Which you then can bring back into Spark: > > >>> df = spark.createDataFrame(df_pd) > >>> df.show() > +---+ > | C| > +---+ > | 2| > | 2| > | 2| > +---+ > > Hope that help. > > Regards, > V > > On 27 Feb 2021, at 05:25, Chetan Khatri <chetan.opensou...@gmail.com> > wrote: > > Hi Users, > > What is equivalent of *df.dropna(axis='columns'**) *of Pandas in the > Spark/Scala? > > Thanks > > >