Understanding pipelined regions

2022-12-19 Thread Raihan Sunny
Hi, I'm quite new to the world of stream and batch processing. I've been reading about pipelined regions in Flink and am quite confused by what it means. My specific problem involves a streaming job that looks like the following: 1. There is a Kafka source that takes in an input data that sets of

Windowing query with group by produces update stream

2022-12-19 Thread Theodor Wübker
Hey everyone, I would like to run a Windowing-SQL query with a group-by clause on a Kafka topic and write the result back to Kafka. Right now, the program always says that I am creating an update-stream that can only be written to an Upsert-Kafka-Sink. That seems odd to me, because running my g

aws glue connector

2022-12-19 Thread Katz, David L via user
Hi- Has anyone used a glue table connector (specifically trying to get a streaming glue table that sits on top of a kinesis data stream)? I have been using the kinesis stream connector but want to integrate with lake formation so would like to take advantage of the glue table layer. Thanks, -

Re: Reduce checkpoint-induced latency by tuning the amount of resource available for checkpoints

2022-12-19 Thread Robin Cassan via user
That's fantastic, thanks a lot for the info, I will definitely try that! Cheers, Robin Le lun. 19 déc. 2022 à 13:16, Hangxiang Yu a écrit : > Hi, Rogin. > If you have upgraded to 1.16, I think your problem will be > solved automatically because the restore mode has been supported from 1.15. >

Re: Rocksdb Incremental checkpoint

2022-12-19 Thread Hangxiang Yu
Hi, IIUC, numRetainedCheckpoints will only influence the space overhead of checkpoint dir, but not the incremental size. RocksDB executes incremental checkpoint based on the shard directory which will always remain SST Files as much as possible (maybe it's from the last checkpoint, or maybe from lo

Rocksdb Incremental checkpoint

2022-12-19 Thread Puneet Duggal
Hi, After going through the following article regarding rocksdb incremental checkpoint (https://flink.apache.org/features/2018/01/30/incremental-checkpointing.html), my understanding was that at each checkpoint, flink only checkpoints newly created SSTables whereas other it can reference from

Re: Reduce checkpoint-induced latency by tuning the amount of resource available for checkpoints

2022-12-19 Thread Hangxiang Yu
Hi, Rogin. If you have upgraded to 1.16, I think your problem will be solved automatically because the restore mode has been supported from 1.15. The NO_CLAIM mode is the default restore mode [1] which will help you to break the lineage of snapshots (both checkpoints and savepoints). When you use t

Re: Reduce checkpoint-induced latency by tuning the amount of resource available for checkpoints

2022-12-19 Thread Robin Cassan via user
Hey Hangxiang! Thanks a lot for your answer Indeed we are currently using Flink 1.13 and plan on moving to 1.16 soon, so it's great news that the non-incremental checkpoints were optimized, thanks for sharing! We decided to no use incremental checkpoints due to the fact that it was hard to expire

Flink reactive mode for application clusters on AWS EKS

2022-12-19 Thread Tamir Sagi
Hey, We are running stream jobs on application clusters (v1.15.2) on AWS EKS. I was reviewing the following pages on Flink confluence * Reactive mode [1] * Adaptive Scheduler [2] I also encountered the following POC conducted by Robert Metzger (@rmetzger_