Hi Artem, I saw your response in the thread I started discussing Kafka distributed transaction support and the XA interface. I would like to work with you to add XA support to Kafka on top of the excellent foundational work that you have started with KIP-939. I agree that explicit XA support should not be included in the Kafka codebase as long as the right set of basic operations are provided. I will begin pulling together a KIP to follow KIP-939.
I did have one comment on KIP-939 itself. I see that you considered an explicit "prepare" RPC, but decided not to add it. If I understand your design correctly, that would mean that a 2PC transaction would have a single timeout that would need to be long enough to ensure that prepared transactions are not aborted when an external coordinator fails. However, this also means that an unprepared transaction would not be aborted without waiting for the same timeout. Since long running transactions block transactional consumers, having a long timeout for all transactions could be disruptive. An explicit "prepare " RPC would allow the server to abort unprepared transactions after a relatively short timeout, and apply a much longer timeout only to prepared transactions. The explicit "prepare" RPC would make Kafka server more resilient to client failure at the cost of an extra synchronous RPC call. I think its worth reconsidering this. With an XA implementation this might become a more significant issue since the transaction coordinator has no memory of unprepared transactions across restarts. Such transactions would need to be cleared by hand through the admin client even when the transaction coordinator restarts successfully. - Rowland