Hi All, I am trying to read a table of a relational database using spark 2.x.
I am using code like the following: sparkContext.read().jdbc(url, table , connectionProperties).select('SELECT_COLUMN').where(whereClause); Now, What's happening is spark is actually the SQL query which spark is running against the relational db is : select column,(where_clause_columns) from table WHERE SELECT_COLUMN IS NOT NULL; And I guess it is doing filtering based on the where clause only after fetching all the data from DB where SELECT_COLUMN IS NOT NULL. I searched about it and found out this is because of pushdown predicate. Is there a way to load data into dataframe using specific query instead of this. I found a solution where if we provide actual query instead of the table name in the following code, it should run that query exactly: table = "select SELECT_COLUMN from table "+ whereClause; sparkContext.read().jdbc(url, table , connectionProperties).select('SELECT_COLUMN').where(whereClause); Does the above seem like a good solution? Regards, Mohit