Thanks for the explanation Ashish. Glad you made it work with custom source.

I guess your application is probably stateless. If so, another option might
be having a geo-distributed Flink deployment. That means there will be TM
in different datacenter to form a single Flink cluster. This will also come
with failover if one of the TM is down. I am not sure if anyone have tried
this. It is probably a heavier solution than using Kafka to do the
failover, but the good thing is that you may also do some stateful
processing if you have a globally accessible storage for the state backup.

Thanks,

Jiangjie (Becket) Qin

On Wed, Sep 4, 2019 at 11:00 AM Ashish Pokharel <ashish...@yahoo.com> wrote:

> Thanks Becket,
>
> Sorry for delayed response. That’s what I thought as well. I built a hacky
> custom source today directly using Kafka client which was able to join
> consumer group etc. which works as I expected but not sure about production
> readiness for something like that :)
>
> The need arises because of (1) Business continuity needs (2) Some of the
> pipelines we are building are close to network edge and need to run on
> nodes where we are not allowed to create cluster (yea - let’s not get into
> that can of security related worms :)). We will get there at some point but
> for now we are trying to support business continuity on those edge nodes by
> not actually forming a cluster but using “walled garden” individual Flink
> server. I fully understand this is not ideal. And all of this started
> because some of the work we were doing with Logstash needed to be migrated
> out as Logstash wasn’t able to keep up with data rates unless we put some
> ridiculous number of servers. In essence, we have pre-approved constraints
> to connect to Kafka and southbound interfaces using Logstash, which we need
> to replace for some datasets as they are massive for Logstash to keep up
> with.
>
> Hope that explains a bit where our head is at.
>
> Thanks, Ashish
>
> On Aug 29, 2019, at 11:40 AM, Becket Qin <becket....@gmail.com> wrote:
>
> Hi Ashish,
>
> You are right. Flink does not use Kafka based group management. So if you
> have two clusters consuming the same topic, they will not divide the
> partitions. The cross cluster HA is not quite possible at this point. It
> would be good to know the reason you want to have such HA and see if Flink
> meets you requirement in another way.
>
> Thanks,
>
> Jiangjie (Becket) Qin
>
> On Thu, Aug 29, 2019 at 9:19 PM ashish pok <ashish...@yahoo.com> wrote:
>
>> Looks like Flink is using “assign” partitions instead of “subscribe”
>> which will not allow participating in a group if I read the code correctly.
>>
>> Has anyone solved this type of problem in past of active-active HA across
>> 2 clusters using Kafka?
>>
>>
>> - Ashish
>>
>> On Wednesday, August 28, 2019, 6:52 PM, ashish pok <ashish...@yahoo.com>
>> wrote:
>>
>> All,
>>
>> I was wondering what the expected default behavior is when same app is
>> deployed in 2 separate clusters but with same group Id. In theory idea was
>> to create active-active across separate clusters but it seems like both
>> apps are getting all the data from Kafka.
>>
>> Anyone else has tried something similar or have an insight on expected
>> behavior? I was expecting to see partial data on both apps and to get all
>> data in one app if other was turned off.
>>
>> Thanks in advance,
>>
>> - Ashish
>>
>>
>

Reply via email to