Hi We intend to do a stream-static join where kafka is a streaming source and RDBMS is a static source.
e.g. User activity data is coming in as a stream from Kafka source and we need to pull User personal details from PostgreSQL. Because PostgreSQL is a static source, the entire "User-Personal-Details" table is being reloaded into spark memory for every microbatch. Is there a way to optimise this? For example we should be able to pull user-ids from every microbatch and then make a query as below ? select * from user-personal-details where user-id in (<list-of-user-ids-from-current-microbatch>) While we could not find a clean way to do this, we chose to make a JDBC connection for every microbatch and achieved the above optimisation. But that is still suboptimal solution because JDBC connection is being created for every micro-batch. Is there a way to pool JDBC connection in Structured streaming? Thanks & regards, Arti Pande