Flink Kafka Consumer performance issue

2022-10-02 Thread Xin Ma
Hi, (flink version 1.14.2, kafka version 2.6.1) I have a flink job consuming kafka and simply sinking the data into s3. The kafka consumer is sometimes delayed on a few partitions. The partitions are evenly registered by flink subtasks. I found there was a correlation between kafka consumer fet

StreamingFileSink & checkpoint tuning

2022-06-30 Thread Xin Ma
Hi, I recently encountered an issue while using StreamingFileSink. I have a flink job consuming records from various sources and write to s3 with streaming file sink. But the job sometimes fails due to checkpoint timeout, and the root cause is checkpoint alignment failure as there is data skewnes

flink kafka producer avro serializer problem

2021-06-17 Thread Xin Ma
Hello, Currently, I am using confluent Kafka Avro serializer to write to Kafka, and in the meanwhile register the schema to confluent schema registry. The problem here is that our upstream is deserialized from msgpack and converted to a hashmap, which is not serializable for avro. The map include

Re: Should I use static database connection pool?

2019-10-16 Thread Xin Ma
nice to have. > > On Wed, 16 Oct 2019 at 10:58, miki haiat wrote: > >> If it's a sink that use jdbc, why not using the flink Jdbcsink connector? >> >> >> On Wed, Oct 16, 2019, 17:03 Xin Ma wrote: >> >>> I have watched one of the recent Flink

Should I use static database connection pool?

2019-10-16 Thread Xin Ma
I have watched one of the recent Flink forward videos, Apache Flink Worst Practices by Konstantin Knauf. The talk helps me a lot and mentions that we should avoid using static variables to share state between tasks. So should I also avoid static database connection? Because I am facing a weird iss