Thanks Becket, Sorry for delayed response. That’s what I thought as well. I built a hacky custom source today directly using Kafka client which was able to join consumer group etc. which works as I expected but not sure about production readiness for something like that :)
The need arises because of (1) Business continuity needs (2) Some of the pipelines we are building are close to network edge and need to run on nodes where we are not allowed to create cluster (yea - let’s not get into that can of security related worms :)). We will get there at some point but for now we are trying to support business continuity on those edge nodes by not actually forming a cluster but using “walled garden” individual Flink server. I fully understand this is not ideal. And all of this started because some of the work we were doing with Logstash needed to be migrated out as Logstash wasn’t able to keep up with data rates unless we put some ridiculous number of servers. In essence, we have pre-approved constraints to connect to Kafka and southbound interfaces using Logstash, which we need to replace for some datasets as they are massive for Logstash to keep up with. Hope that explains a bit where our head is at. Thanks, Ashish > On Aug 29, 2019, at 11:40 AM, Becket Qin <becket....@gmail.com> wrote: > > Hi Ashish, > > You are right. Flink does not use Kafka based group management. So if you > have two clusters consuming the same topic, they will not divide the > partitions. The cross cluster HA is not quite possible at this point. It > would be good to know the reason you want to have such HA and see if Flink > meets you requirement in another way. > > Thanks, > > Jiangjie (Becket) Qin > > On Thu, Aug 29, 2019 at 9:19 PM ashish pok <ashish...@yahoo.com > <mailto:ashish...@yahoo.com>> wrote: > Looks like Flink is using “assign” partitions instead of “subscribe” which > will not allow participating in a group if I read the code correctly. > > Has anyone solved this type of problem in past of active-active HA across 2 > clusters using Kafka? > > > - Ashish > On Wednesday, August 28, 2019, 6:52 PM, ashish pok <ashish...@yahoo.com > <mailto:ashish...@yahoo.com>> wrote: > > All, > > I was wondering what the expected default behavior is when same app is > deployed in 2 separate clusters but with same group Id. In theory idea was to > create active-active across separate clusters but it seems like both apps are > getting all the data from Kafka. > > Anyone else has tried something similar or have an insight on expected > behavior? I was expecting to see partial data on both apps and to get all > data in one app if other was turned off. > > Thanks in advance, > > - Ashish