I don't really understand how task2 reads static data from task1, but I think you can integrate the logic of getting static data from http in task1 into task2 and keep only one kind of task.
Best, Weihua On Wed, Jun 15, 2022 at 10:07 AM Great Info <gubt...@gmail.com> wrote: > thanks for helping with some inputs, yes I am using rich function and > handling objects created in open, and also and network calls are getting > called in a run. > but currently, I got stuck running this same task on *all task managers* > (nodes), when I submit the job, this task1(static data task) runs only one > task manager, I have 3 task managers in my Flink cluster. > > > On Tue, Jun 14, 2022 at 7:20 PM Weihua Hu <huweihua....@gmail.com> wrote: > >> Hi, >> >> IMO, Broadcast is a better way to do this, which can reduce the QPS of >> external access. >> If you do not want to use Broadcast, Try using RichFunction, start a >> thread in the open() method to refresh the data regularly. but be careful >> to clean up your data and threads in the close() method, otherwise it will >> lead to leaks. >> >> Best, >> Weihua >> >> >> On Tue, Jun 14, 2022 at 12:04 AM Great Info <gubt...@gmail.com> wrote: >> >>> Hi, >>> I have one flink job which has two tasks >>> Task1- Source some static data over https and keep it in memory, this >>> keeps refreshing it every 1 hour >>> Task2- Process some real-time events from Kafka and uses static data to >>> validate something and transform, then forward to other Kafka topic. >>> >>> so far, everything was running on the same Task manager(same node), but >>> due to some recent scaling requirements need to enable partitioning on >>> Task2 and that will make some partitions run on other task managers. but >>> other task managers don't have the static data >>> >>> is there a way to run Task1 on all the task managers? I don't want to >>> enable broadcasting since it is a little huge and also I can not persist >>> data in DB due to data regulations. >>> >>>