Hi

In the financial systems world, if some data is being updated too
frequently, and that data is to be used as reference data by a Spark job
that runs for 6/7 hours, most likely Spark job may read that data at the
beginning and keep it in memory as DataFrame and will keep running for
remaining 6/7 hours. Meanwhile if the reference data is updated by some
other system, then Spark job's in-memory copy of that data (data frame)
goes out of sync.

Is there a way to refresh that reference data in Spark memory / dataframe
by some means?

This seems to be a very common scenario. Is there a solution / workaround
for this?

Thanks & regards,
Arti Pande

Reply via email to