thanks for helping with some inputs, yes I am using rich function and handling objects created in open, and also and network calls are getting called in a run. but currently, I got stuck running this same task on *all task managers* (nodes), when I submit the job, this task1(static data task) runs only one task manager, I have 3 task managers in my Flink cluster.
On Tue, Jun 14, 2022 at 7:20 PM Weihua Hu <huweihua....@gmail.com> wrote: > Hi, > > IMO, Broadcast is a better way to do this, which can reduce the QPS of > external access. > If you do not want to use Broadcast, Try using RichFunction, start a > thread in the open() method to refresh the data regularly. but be careful > to clean up your data and threads in the close() method, otherwise it will > lead to leaks. > > Best, > Weihua > > > On Tue, Jun 14, 2022 at 12:04 AM Great Info <gubt...@gmail.com> wrote: > >> Hi, >> I have one flink job which has two tasks >> Task1- Source some static data over https and keep it in memory, this >> keeps refreshing it every 1 hour >> Task2- Process some real-time events from Kafka and uses static data to >> validate something and transform, then forward to other Kafka topic. >> >> so far, everything was running on the same Task manager(same node), but >> due to some recent scaling requirements need to enable partitioning on >> Task2 and that will make some partitions run on other task managers. but >> other task managers don't have the static data >> >> is there a way to run Task1 on all the task managers? I don't want to >> enable broadcasting since it is a little huge and also I can not persist >> data in DB due to data regulations. >> >>