Theoretically the executor is a long lived container. So you could use some simple caching library or a simple Singleton to cache the data in your executors, once they load it from mysql. But note that with lots of executors you might choke your mysql.
Regards Sab On 05-Nov-2015 7:03 pm, "Kay-Uwe Moosheimer" <u...@moosheimer.com> wrote: > I have the following problem. > We have MySQL and an Spark cluster. > We need to load 5 different validation-instructions (several thousand of > entries each) and use this information on the executors to decide if > content from Kafka-Streaming is for process a or b. > The streaming data from kafka are json messages and the validation-info > from MySQL says „if field a is that and field b ist that then process a“ > and so on. > > The tables on MySQL are changing over time and we have to reload the data > every hour. > I tried to use broadcasting where I load the data and store it on HashSets > and HashMaps (java code), but It’s not possible to redistribute the data. > > What would be the best way to resolve my problem? > Se native jdbc in executor task an load the data – can the executor store > this data on HashSets etc. for next call so that I only load the data every > hour? > Use other possibilities? > >