On Mon, Apr 29, 2019 at 1:57 PM Joe F <joefranc...@gmail.com> wrote: > > I have suggestions for an alternate solution. > > If source message-ids were known for replicated messages, a composite > cursor can be maintained for replicated subscriptions as an n-tuple. Since > messages are ordered from a source, it would be possible to restart from a > known cursor n-tuple in any cluster by a combination of cursor > positioning _and_ filtering
Knowing the source message id alone is not enough to establish the order relationship across all the clusters. I think that would only work in the 2 clusters scenario. In general, for each message being written in region A, both locally published or replicated from other regions, we'd have to associate the highest (original from the source) message id. While that could be easy in the simplest case (broker maintains hashmap of the highest source message id from each region), it becomes more difficult to consider failure cases in which we have to reconstruct that hashmap from the log. Also, that would require to modify each message before persisting it, in the target region. Finally, the N-tuple of message ids would have to either: * Have a mapping in the broker (local-message -> N-tuple) * Be pushed to consumers so that they will ack with the complete context > > A simple way to approximate this is for each cluster to insert its own > ticker marks into the topic. A ticker carries the messsage id as the > message body. The ticker mark can be inserted every 't ' time interval or > every 'n' messages as needed. > > The n-tuple of the tickers from each cluster is a well-known state that > can be re-started anywhere by proper positioning and filtering > > That is a simpler solution for users to understand and trouble-shoot. It > would be resilient to cluster failures, and does NOT require all clusters > to be up, to determine cursor position. No cross-cluster > communication/ordering is needed. I don't think this approach can remove the requirement of "all the clusters to be up" because the information in A won't have any context on what was exchanged between B and C. One quick example: * Message c2 was replicated to B but not A. Then C goes offline. * A tuple will be something like (a3, b4, c1) * In region B, b4 was actually written *after* c2, but c2 doesn't get sent from B to A, since it was already replicated itself. * If we do a failover from A to B considering a3 ~= b4 we would be missing the message c2