Hi Team,

We are planning to run the below pipeline as a standalone Flink application
cluster on kubernetes. It will be better if the community can share their
insights regarding the below questions.

[image: image.png]
We can describe the pipeline as follows,

   1. Combine the realtime streams from S1, enrichment data from S2 and S3
   using Union Operator. Partition the stream based on value1 for keeping the
   enrichment data locally available.
   2. Broadcast the rules to process the data from S4.
   3. Connect the above two streams(1&2) and process the real time events
   from S1 using the enrichment data from S2 and S3 stored in rocksDB state as
   per the rules stored in broadcast state inside the keyed broadcast process
   function.
   4. Produce the transformed results to a Kafka Sink.

Note: Kafka Source S1 has 32 partitions. Suppose we have 1 million distinct
keys and expect 10k events/s from S1.

Approach 1: Application cluster with 16 task managers. Each task manager
has 2 slots and 2 CPUs.
Approach 2: Application cluster with 2 task managers. Each task manager has
16 slots and 16 CPUs.

*Questions*

   - Which approach is suitable for a standalone deployment in Kubernetes?
   Do we have some best practises for running Flink applications on K8s ?
   - We are planning to connect the source S1, S2 and S3 using Union
   Operator. And these sources have different parallelism settings, equal to
   the available kafka partitions. And the downstream process function has the
   same parallelism as the real-time kafka source S1. Is it a good idea to
   apply union on streams with different parallelisms ?.
   - The size of the broadcast state is around 20mb, so the checkpoint size
   of the broadcast state will be 740mb ( maximum parallelism * size, 32* 20
   ). All events required the entire rules for processing the data, hence
   keeping this in rocksdb is not possible. Is it a good approach to keep a
   large state in broadcast-state?.
   - Is it a good practice to use a singleton pattern in Flink to create a
   local cache of the rules inside the open method of process function ?. If
   data losses due to restart i can repopulate the data using an external
   call. Can I keep these kinds of local caches(created inside open method)
   safely for the entire lifetime of a particular pod/task manager ?
   - Is there any relation between incremental checkpoints and maximum
   number of completed checkpoints (state.checkpoints.num-retained) ?
   - Will the entire state be checkpointed every time irrespective of the
   delta between the checkpoints if I have enabled incremental checkpoints for
   my rocksdb state backend and set the maximum number of completed
   checkpoints to 1 ?

Thanks
Jessy

Reply via email to