What would be the recommended way to close resources opened or shared by
executors?

A few use cases

#1) Let's say the enrichment process needs to convert ip / lat+long to
city/country. To achieve this, executors could open a file in the hdfs and
build a map or use a memory mapped file  - the implementation could be a
transient lazy val singleton or something similar .  Now, the udf defined
would perform lookups on these data structures and return geo data.

#2) Let's say there is a need to do a lookup on a KV store like redis from
the executor. Each executor would create a connection pool and provide
connections for tasks running in them to perform lookups.

In scenarios, like this when the executor is closed, what would be the best
way to close the open resources ( streams etc)


Any pointers to places where i could read up a bit more about the best
practices around it would be highly appreciated!

thanks
appu

Reply via email to