RE: [DISCUSS] FLIP-511: Support transaction id pooling in Kafka connector

David Radley Wed, 26 Mar 2025 09:28:30 -0700

Hi Arvid,
This sounds like a good idea. I wondered:

  *   Can we check that we have the appropriate permissions and the appropriate 
Kafka level(at least API key 66)  to validate this option.
  *   I am wondering what we do for the 2 errors that the Kafka API can give 
namely
     *   COORDINATOR_LOAD_IN_PROGRESS: The coordinator is in the process of 
loading its state.
     *   COORDINATOR_NOT_AVAILABLE: If the coordinator receiving the request is 
being shutdown.=

In the case of COORDINATOR_LOAD_IN_PROGRESS I guess we could have some sort of 
retry loop – governed by a configuration parameter.

  *   Do we need to do anything special for find-hanging transactions?
  *   Reusing transaction ids could be confusing in debugging.  I wonder if we 
should not eagerly reuse transaction ids so the debugging would be more 
straight forward. Better still cpi;d be use a uuid as part of the identifier; 
this would seem more normal practise rather than managing an incrementing 
counter and filling the gaps as they become available.
  *   I am not very familiar with the KIP - I am unsure what you mean by “ it 
also knows which transactions still need to be re-committed. It aborts all 
other open transactions.”. When/why do we need to re-commit transactions and 
when do we abort?
  *   I wonder why “we expect 3 transactional ids to be in use per subtask”.
  *   I wonder why “There is at least one finalized transaction waiting to be 
committed” Do you mean the transaction is in state CompleteCommit? Can we not 
be the first transaction so have no older ones – or have I misunderstood?

Kind regards, David.

From: Arvid Heise <ahe...@confluent.io.INVALID>
Date: Wednesday, 26 March 2025 at 06:49
To: dev@flink.apache.org <dev@flink.apache.org>
Subject: [EXTERNAL] Re: [DISCUSS] FLIP-511: Support transaction id pooling in 
Kafka connector
Hi Hongshun,

this thread is about addressing your first pain point. It would be
awesome if you could read the FLIP and provide feedback whether this
approach would work in your setup.

The other issues won't need a FLIP but need a smaller bugfix.

Best,

Arvid

On Tue, Mar 25, 2025 at 7:57 AM Hongshun Wang <loserwang1...@gmail.com> wrote:
>
> Hi heise,
> Thanks for pushing this job. Currently, kafka sinks in transactions are not
> productive, which will influence the whole kafka cluster.
> As we discussed before, I have three problems when using Kafka
>  exactly-once sink :
> 1. Too much transaction id https://issues.apache.org/jira/browse/FLINK-34554
> 2. Too much unclosed producer ( if commit is lag too much from pre-commit:
> https://issues.apache.org/jira/browse/FLINK-36569
> 3. maybe Invalid state: https://issues.apache.org/jira/browse/FLINK-37356
>
> I do hope this flip can show how to solve this problem.
>
> Best
> Hongshun Wang
>
>
>
> On Fri, Feb 28, 2025 at 10:01 PM Arvid Heise <ar...@apache.org> wrote:
>
> > Dear Flink devs,
> >
> > I'd like to start discussion around an improvement of the exactly-once
> > Kafka sink. Because of its current design, the sink currently puts a
> > large strain on the main memory of the Kafka broker [1], which hinders
> > adoption of the KafkaSink. Since the KafkaSink will be the only way to
> > produce into Kafka for Flink 2.0, this limitation may in turn also
> > limit Flink 2 adoption.
> >
> > I appreciate feedback not only on technical aspects but also on the
> > requirements of the new approach [2]:
> > - Are most users/customers already on Kafka 3.0 broker?
> > - The new approach requires READ permission on the target topic (for
> > DescribeProducer API) and DESCRIBE permission on the transaction (for
> > ListTransaction API). Are there any concerns around this?
> >
> > The new behavior would be opt-in in the first phase while the current
> > behavior remains the DEFAULT. If the community accepts the new
> > behavior, we can turn it on by default and provide an option to
> > opt-out.
> >
> > Note that this improvement is borderline FLIP-worthy; it touches the
> > public interfaces of the connector only. However, by doing an official
> > FLIP, I'm positive that we can gather feedback more quickly.
> >
> > I'm also looking for volunteers that test the implementation on their
> > dev clusters [3].
> >
> > Best,
> >
> > Arvid
> >
> > [1] https://issues.apache.org/jira/browse/FLINK-34554
> > [2]
> > https://cwiki.apache.org/confluence/display/FLINK/FLIP-511%3A+Support+transaction+id+pooling+in+Kafka+connector
> > [3] https://github.com/apache/flink-connector-kafka/pull/154
> >

Unless otherwise stated above:

IBM United Kingdom Limited
Registered in England and Wales with number 741598
Registered office: Building C, IBM Hursley Office, Hursley Park Road, 
Winchester, Hampshire SO21 2JN

RE: [DISCUSS] FLIP-511: Support transaction id pooling in Kafka connector

Reply via email to