Hi Chris, thanks for the comments, replies inline:
On Sat, Mar 23, 2019 at 5:53 AM Chris Bartholomew <c_bartholo...@yahoo.com.invalid> wrote: > > Hi, > > I have a few thoughts that are hopefully useful. I have some experience > testing a similar feature for a proprietary messaging system and understand > that system's design well, which is similar to this proposal. > > Would it make sense to have the snapshot frequency based on messages not > time, or to allow both modes? Many messages can be processed in a second, > which could create a lot of duplicates in the case of a failover. By using > message counts, it would bound the number of duplicate messages and thus the > number of messages that end-user application would have to keep track of for > de-duplication. It would be performance impacting compared to the timer > solution, I assume, but would give the end user more control and might be > worth the trade off. Sure, it's a good point and that's perfectly possible to do so. For many features we have these double constraints on messages and time. I just wanted to start with the simplest implementation (and config options) and then expand from there, since it's always easier to add more options and knobs than to remove them later. > When there is a single consumer for the global subscription, the behavior is > clear and well documented here. What happens if multiple consumers are > consuming from the subscription at the same time on different clusters? > Presumably the implementation will have to handle the case where a message > has been consumed locally and then gets notified that the message has been > consumed remotely--maybe this will just work, I am not sure. In any case, > having multiple consumers on a global subscription is probably not what the > end user wants--they likely want an pure active-standby setup. > > Would it make sense to somehow mark a subscription as the active one, with > others being standby? That way, messages can only be consumed from one > subscription at time, which makes sense in a disaster recovery scenario. This > could be configurable by the operator to make the failover to the recovery > cluster controllable. With this proposal, the behavior will be that: * If consumers are attached in multiple regions they will see some dups (eg: message is received in both regions) * Some messages might be acknowledged quickly and therefore automatically discarded in the other regions To summarize: * It probably doesn't make sense to have consumers active in multiple regions at the same time (on a replicated subscription) * It won't cause major harm (only some dups) The main issue is that deciding which "cluster" is active will require coordination across multiple regions which goes against the main goal of async replication which is to isolate the clusters from each other to the max extent to minimize the chances of one single malfunctioning cluster to impact other clusters. Even in case of an operator tool, it would have to be applied consistently on the multiple datacenters. My take is that it's better to leave the consumer-cluster-failover operation outside the scope of Pulsar itself, unless we can find a way to do it in a precise, reliable and consistent way. > Last thought is on message TTL. I assume that works similarly to actually > consume the message and will just work. You probably only want to enable TTL > on only one of the clusters and let it handle the expiry of messages for the > whole group. This change won't touch the current logic of TTL. Right now, messages that have already expired won't be replicated, though that typically happens if there was some network issue between the clusters that caused the geo-replication to back up. In normal conditions, messages are replicated immediately after the local publish so the TTL is actually applied in both clusters.