Hi All,

I am trying to read a table of a relational database using spark 2.x.

I am using code like the following:

sparkContext.read().jdbc(url, table ,
connectionProperties).select('SELECT_COLUMN').where(whereClause);


Now, What's happening is spark is actually the SQL query which spark is
running against the relational db is :

select column,(where_clause_columns) from table WHERE SELECT_COLUMN IS NOT
NULL;

And I guess it is doing filtering based on the where clause only after
fetching all the data from DB where SELECT_COLUMN IS NOT NULL.

I searched about it and found out this is because of pushdown predicate. Is
there a way to load data into dataframe using specific query instead of
this.

I found a solution where if we provide actual query instead of the table
name in the following code, it should run that query exactly:

table = "select SELECT_COLUMN from table  "+ whereClause;
sparkContext.read().jdbc(url, table ,
connectionProperties).select('SELECT_COLUMN').where(whereClause);


Does the above seem like a good solution?


Regards,
Mohit

Reply via email to