Hello Chetan, I don’t know about Scala, but in PySpark there is no elegant way of dropping NAs on column axis.
Here is a possible solution to your problem: >>> data = [(None, 1, 2), (0, None, 2), (0, 1, 2)] >>> columns = ('A', 'B', 'C') >>> data = [(None, 1, 2), (0, None, 2), (0, 1, 2)] >>> df = spark.createDataFrame(data, columns) >>> df.show() +----+----+---+ | A| B| C| +----+----+---+ |null| 1| 2| | 0|null| 2| | 0| 1| 2| +----+----+---+ >>> for column in df.columns: if df.select(column).where(df[column].isNull()).first(): df = df.drop(column) ... >>> df.show() +---+ | C| +---+ | 2| | 2| | 2| +—+ If your dataframe doesn’t exceed the size of your memory, I suggest you bring it into Pandas. >>> df_pd = df.toPandas() >>> df_pd A B C 0 NaN 1.0 2 1 0.0 NaN 2 2 0.0 1.0 2 >>> df_pd = df_pd.dropna(axis='column’) >>> df_pd C 0 2 1 2 2 2 Which you then can bring back into Spark: >>> df = spark.createDataFrame(df_pd) >>> df.show() +---+ | C| +---+ | 2| | 2| | 2| +---+ Hope that help. Regards, V > On 27 Feb 2021, at 05:25, Chetan Khatri <chetan.opensou...@gmail.com> wrote: > > Hi Users, > > What is equivalent of df.dropna(axis='columns') of Pandas in the Spark/Scala? > > Thanks