thanks for helping with some inputs, yes I am using rich function and
handling objects created in open, and also and network calls are getting
called in a run.
but currently, I got stuck running this same task on *all task managers*
(nodes), when I submit the job, this task1(static data task) runs only one
task manager, I have 3 task managers in my Flink cluster.


On Tue, Jun 14, 2022 at 7:20 PM Weihua Hu <huweihua....@gmail.com> wrote:

> Hi,
>
> IMO, Broadcast is a better way to do this, which can reduce the QPS of
> external access.
> If you do not want to use Broadcast, Try using RichFunction, start a
> thread in the open() method to refresh the data regularly. but be careful
> to clean up your data and threads in the close() method, otherwise it will
> lead to leaks.
>
> Best,
> Weihua
>
>
> On Tue, Jun 14, 2022 at 12:04 AM Great Info <gubt...@gmail.com> wrote:
>
>> Hi,
>> I have one flink job which has two tasks
>> Task1- Source some static data over https and keep it in memory, this
>> keeps refreshing it every 1 hour
>> Task2- Process some real-time events from Kafka and uses static data to
>> validate something and transform, then forward to other Kafka topic.
>>
>> so far, everything was running on the same Task manager(same node), but
>> due to some recent scaling requirements need to enable partitioning on
>> Task2 and that will make some partitions run on other task managers. but
>> other task managers don't have the static data
>>
>> is there a way to run Task1 on all the task managers? I don't want to
>> enable broadcasting since it is a little huge and also I can not persist
>> data in DB due to data regulations.
>>
>>

Reply via email to