Bump - hoping someone has some insight. Alternately, redirection to a more suitable forum.
Thanks On Sun, Feb 3, 2019 at 10:25 AM Adam Bellemare <adam.bellem...@gmail.com> wrote: > Hey Folks > > I have a few questions around the operations of stateful processing while > scaling nodes up/down, and a possible KIP in question #4. Most of them have > to do with task processing during rebuilding of state stores after scaling > nodes up. > > Scenario: > Single node/thread, processing 2 topics (10 partitions each): > User event topic (events) - ie: key:userId, value: ProductId > Product topic (entity) - ie: key: ProductId, value: productData > > My topology looks like this: > > KTable productTable = ... //materialize from product topic > > KStream output = userStream > .map(x => (x.value, x.key) ) //Swap the key and value around > .join(productTable, ... ) //Joiner is not relevant here > .to(...) //Send it to some output topic > > > Here are my questions: > 1) If I scale the processing node count up, partitions will be rebalanced > to the new node. Does processing continue as normal on the original node, > while the new node's processing is paused as the internal state stores are > rebuilt/reloaded? From my reading of the code (and own experience) I > believe this to be the case, but I am just curious in case I missed > something. > > 2) What happens to the userStream map task? Will the new node be able to > process this task while the state store is rebuilding/reloading? My reading > of the code suggests that this map process will be paused on the new node > while the state store is rebuilt. The effect of this is that it will lead > to a delay in events reaching the original node's partitions, which will be > seen as late-arriving events. Am I right in this assessment? > > 3) How does scaling up work with standby state-store replicas? From my > reading of the code, it appears that scaling a node up will result in a > reabalance, with the state assigned to the new node being rebuilt first > (leading to a pause in processing). Following this, the standy replicas are > populated. Am I correct in this reading? > > 4) If my reading in #3 is correct, would it be possible to pre-populate > the standby stores on scale-up before initiating active-task transfer? This > would allow seamless scale-up and scale-down without requiring any pauses > for rebuilding state. I am interested in kicking this off as a KIP if so, > but would appreciate any JIRAs or related KIPs to read up on prior to > digging into this. > > > Thanks > > Adam Bellemare >