Thanks - that sounds like a good model to at least explore. We are essentially 
stateless at this point for this particular need. 


- Ashish

On Tuesday, September 3, 2019, 11:28 PM, Becket Qin <becket....@gmail.com> 
wrote:

Thanks for the explanation Ashish. Glad you made it work with custom source.
I guess your application is probably stateless. If so, another option might be 
having a geo-distributed Flink deployment. That means there will be TM in 
different datacenter to form a single Flink cluster. This will also come with 
failover if one of the TM is down. I am not sure if anyone have tried this. It 
is probably a heavier solution than using Kafka to do the failover, but the 
good thing is that you may also do some stateful processing if you have a 
globally accessible storage for the state backup.
Thanks,
Jiangjie (Becket) Qin
On Wed, Sep 4, 2019 at 11:00 AM Ashish Pokharel <ashish...@yahoo.com> wrote:

Thanks Becket,
Sorry for delayed response. That’s what I thought as well. I built a hacky 
custom source today directly using Kafka client which was able to join consumer 
group etc. which works as I expected but not sure about production readiness 
for something like that :)
The need arises because of (1) Business continuity needs (2) Some of the 
pipelines we are building are close to network edge and need to run on nodes 
where we are not allowed to create cluster (yea - let’s not get into that can 
of security related worms :)). We will get there at some point but for now we 
are trying to support business continuity on those edge nodes by not actually 
forming a cluster but using “walled garden” individual Flink server. I fully 
understand this is not ideal. And all of this started because some of the work 
we were doing with Logstash needed to be migrated out as Logstash wasn’t able 
to keep up with data rates unless we put some ridiculous number of servers. In 
essence, we have pre-approved constraints to connect to Kafka and southbound 
interfaces using Logstash, which we need to replace for some datasets as they 
are massive for Logstash to keep up with. 
Hope that explains a bit where our head is at.
Thanks, Ashish 


On Aug 29, 2019, at 11:40 AM, Becket Qin <becket....@gmail.com> wrote:
Hi Ashish,
You are right. Flink does not use Kafka based group management. So if you have 
two clusters consuming the same topic, they will not divide the partitions. The 
cross cluster HA is not quite possible at this point. It would be good to know 
the reason you want to have such HA and see if Flink meets you requirement in 
another way.
Thanks,
Jiangjie (Becket) Qin
On Thu, Aug 29, 2019 at 9:19 PM ashish pok <ashish...@yahoo.com> wrote:

Looks like Flink is using “assign” partitions instead of “subscribe” which will 
not allow participating in a group if I read the code correctly. 
Has anyone solved this type of problem in past of active-active HA across 2 
clusters using Kafka? 


- Ashish

On Wednesday, August 28, 2019, 6:52 PM, ashish pok <ashish...@yahoo.com> wrote:

All,
I was wondering what the expected default behavior is when same app is deployed 
in 2 separate clusters but with same group Id. In theory idea was to create 
active-active across separate clusters but it seems like both apps are getting 
all the data from Kafka. 
Anyone else has tried something similar or have an insight on expected 
behavior? I was expecting to see partial data on both apps and to get all data 
in one app if other was turned off.

Thanks in advance,

- Ashish









Reply via email to