Re: [DISCUSS] KIP-892: Transactional Semantics for StateStores

Bruno Cadonna Thu, 22 Jun 2023 04:00:28 -0700

Hi Nick,

1.

Yeah, I agree with you. That was actually also my point. I understoodthat John was proposing the ingestion path as a way to avoid the earlycommits. Probably, I misinterpreted the intent.

2.

I agree with John here, that actually it is public API. My question ishow this usage pattern affects normal processing.

3.

My concern is that checking for the size of the transaction buffer andmaybe triggering an early commit affects the whole processing of KafkaStreams. The transactionality of a state store is not confined to thestate store itself, but spills over and changes the behavior of otherparts of the system. I agree with you that it is a decent compromise. Ijust wanted to analyse the downsides and list the options to overcomethem. I also agree with you that all options seem quite heavy comparedwith your KIP. I do not understand what you mean with "less predictablefor users", though.

I found the discussions about the alternatives really interesting. But Ialso think that your plan sounds good and we should continue with it!



Some comments on your reply to my e-mail on June 20th:

3.

Ah, now, I understand the reasoning behind putting isolation level inthe state store context. Thanks! Should that also be a way to give thethe state store the opportunity to decide whether to turn ontransactions or not?With my comment, I was more concerned about how do you know if acheckpoint file needs to be written under EOS, if you do not have a wayto know if the state store is transactional or not. If a state store istransactional, the checkpoint file can be written during normalprocessing under EOS. If the state store is not transactional, thecheckpoint file must not be written under EOS.

7.

My point was about not only considering the bytes in memory in configstatestore.uncommitted.max.bytes, but also bytes that might be spilledon disk. Basically, I was wondering whether you should remove the"memory" in "Maximum number of memory bytes to be used tobuffer uncommitted state-store records." My thinking was that even if astate store spills uncommitted bytes to disk, limiting the overall bytesmight make sense. Thinking about it again and considering the recentdiscussions, it does not make too much sense anymore.

I like the name statestore.transaction.buffer.max.bytes that you proposed.

8.

A high-level description (without implementation details) of how KafkaStreams will manage the commit of changelog transactions, state storetransactions and checkpointing would be great. Would be great if youcould also add some sentences about the behavior in case of a failure.For instance how does a transactional state store recover after afailure or what happens with the transaction buffer, etc. (that is whatI meant by "fail-over" in point 9.)


Best,
Bruno

On 21.06.23 18:50, Nick Telford wrote:

Hi Bruno,

1.
Isn't this exactly the same issue that WriteBatchWithIndex transactions
have, whereby exceeding (or likely to exceed) configured memory needs to
trigger an early commit?

2.
This is one of my big concerns. Ultimately, any approach based on cracking
open RocksDB internals and using it in ways it's not really designed for is
likely to have some unforseen performance or consistency issues.

3.
What's your motivation for removing these early commits? While not ideal, I
think they're a decent compromise to ensure consistency whilst maintaining
good and predictable performance.
All 3 of your suggested ideas seem *very* complicated, and might actually
make behaviour less predictable for users as a consequence.

I'm a bit concerned that the scope of this KIP is growing a bit out of
control. While it's good to discuss ideas for future improvements, I think
it's important to narrow the scope down to a design that achieves the most
pressing objectives (constant sized restorations during dirty
close/unexpected errors). Any design that this KIP produces can ultimately
be changed in the future, especially if the bulk of it is internal
behaviour.

I'm going to spend some time next week trying to re-work the original
WriteBatchWithIndex design to remove the newTransaction() method, such that
it's just an implementation detail of RocksDBStore. That way, if we want to
replace WBWI with something in the future, like the SST file management
outlined by John, then we can do so with little/no API changes.

Regards,

Nick

Re: [DISCUSS] KIP-892: Transactional Semantics for StateStores

Reply via email to