Re: Mirror Maker 2.0 Queries

2020-07-13 Thread Ryanne Dolan
Ananya, yes the driver is distributed, but each worker only communicates
via kafka. They do not listen on any ports.

Ryanne

On Sat, Jul 11, 2020, 11:28 AM Ananya Sen  wrote:

> Hi
>
> I was exploring the Mirror maker 2.0. I read through this
>
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-382%3A+MirrorMaker+2.0
> documentation
> and I have  a few questions.
>
>1. For running mirror maker as a dedicated mirror maker cluster, the
>documentation specifies a config file and a starter script. Is this
> mirror
>maker process distributed ?
>2. I could not find any port configuration for the above mirror maker
>process, So can we configure mirror maker itself to run as a cluster i.e
>running the process instance across multiple server to avoid downtime
> due
>to server crash.
>3. If we could somehow run the mirror maker as a distributed process
>then does that mean that topic and consumer offset replication will be
>shared among those mirror maker processes?
>4. What is the default port of this mirror maker process and how can we
>override it?
>
> Looking forward to your reply.
>
>
> Thanks & Regards
> Ananya Sen
>


Re: NotEnoughReplicasException: The size of the current ISR Set(2) is insufficient to satisfy the min.isr requirement of 3

2020-07-13 Thread Nag Y
Thanks Liam, so the phrase "  current ISR set " in the warning refers to
ISR set that is being shown as kafka-topics describe command ?
And also, should the "maximum" value of " min.insync.replicas" be
(replication factor - 1 ) - I mean min.insync.replicas should not be same
as " replication factor" .

Please confirm

On Tue, Jul 7, 2020 at 1:22 PM Liam Clarke-Hutchinson <
liam.cla...@adscale.co.nz> wrote:

> Hi Nag,
>
> ISR is the replicas that are in sync with the leader, and there's a
> different ISR set for each partition of a given topic. If you use
> `kafka/bin/kafka-topics --describe --topic ` it'll show you the replicas
> and ISR for each partition.
>
> min.insync.replicas and replication factor are all about preventing data
> loss. Generally I set min ISR to 2 for a topic with a replication factor of
> 3 so that one down or struggling broker doesn't prevent producers writing
> to topics, but I still have a replica of the data in case the broker acting
> as leader goes down - a new partition leader can only be elected from the
> insync replicas.
>
> On Tue, Jul 7, 2020 at 7:39 PM Nag Y  wrote:
>
> > I had the following setup Brokers : 3 - all are up and running with
> > min.insync.replicas=3.
> >
> > I created a topic with the following configuration
> >
> > bin\windows\kafka-topics --zookeeper 127.0.0.1:2181 --topic
> topic-ack-all
> > --create --partitions 4 --replication-factor 3
> >
> > I triggered the producer with "ack = all" and producer is able to send
> the
> > message. However, the problem starts when i start the consumer
> >
> > bin\windows\kafka-console-consumer --bootstrap-server
> > localhost:9094,localhost:9092 --topic topic-ack-all --from-beginning
> >
> > The error is
> >
> > NotEnoughReplicasException: The size of the current ISR Set(2) is
> > insufficient to satisfy the min.isr requirement of 3
> > NotEnoughReplicasException:The size of the current ISR Set(3) is
> > insufficient to satisfy the min.isr requirement of 3 for partition __con
> >
> > I see two kinds of errors here . I went though the documentation and had
> > also understaning about "min.isr", However, these error messages are not
> > clear .
> >
> >1. What does it mean by current ISR set ? Is it different for each
> topic
> >and what it signifies ?
> >2. I guess min.isr is same as min.insync.replicas . I hope is should
> >have value at least same as "replication factor" ?
> >
>


Kafka connect after restart doesn't read connectors configuration

2020-07-13 Thread Сергей Булавинцев
Hi,

I have a question regarding kafka connect and how it operates in
distributed mode. We use docker swarm to deploy kafka connect and only one
instance of it in swarm. After node crashing with kafka connect and
starting another instance on a new node it starts to read configuration
topic from offset 385 despite seeking to beginning of the topic. As I
checked in kafka connect code in KafkaBasedLog class it always tries to
read configuration topic from the beginning even if there are some
committed offsets, however in our case it reads from the last offset and no
configuration of connectors is read. Does anyone have any idea why it
happens and how it could be fixed?

Thanks.


ktable - ktable join

2020-07-13 Thread Dumitru-Nicolae Marasoui
Hello kafka community,
In a ktable-ktable join, assuming that kt1 has k1 and v1 that contains k2
and kt2 has k2 and v2,
is it possible that the (k1, v1, k2, v2) pair is never emitted?
I am trying to understand how it works and if any race condition would be
possible.
If race conditions would not be possible, then, ignoring any deduplication
or filtering, what I imagine would be emitted is one of:
(k1, v1, null, null), (k1, v1, k2, v2) or
(null, null, k2, v2), (k1, v1, k2, v2)
If this would be the case, and given our data is immutable in this
particular k1 - k2 - v2, I could just filter in cases k1, v1, k2, v2 are
all non null and emit only those tuples.
But would it be possible that the two messages that are linked, that come
to different topics, are concurrently processed? Theoretically I think so?
Which would be the solution to this? A linearizable store like a single
threaded Redis?
Thank you,

-- 

Dumitru-Nicolae Marasoui

Software Engineer



w kaluza.com 

LinkedIn  | Twitter


Kaluza Ltd. registered in England and Wales No. 08785057

VAT No. 100119879

Help save paper - do you need to print this email?


3 kafka-streams or a single kafka streams

2020-07-13 Thread Dumitru-Nicolae Marasoui
Hello kafka community,
I imagine the following behavior that I could code in 3 kafka-streams
pipelines and wondering if it can be done in fewer kafka streams with the
same guarantees:

I have 3 compacted topics, t1, t2 and t3, where t2 is the link (many-many)
between t1 & t3.
The same about t4-t6.

I need to replicate t1-3 in t4-6 with a slightly different domain and move
a column (avro attribute) from t1 to t6 to adapt the domains.

In a multi-kafka-streams-pipelines paradigm I could do the following:

I would create 2 or 3 intermediary topics:
- a topic keyed in t1 key which gets events copies from both t1 & t2, both
message types keyed in t1 key (kafka-streams-1 which merges 2 streams based
on t1 & t2 with slight re-keying)
- a topic keyed in t3 key which gets events copies from both t3 & t2, both
message types keyed in t3 key (kafka-streams-2 which merges 2 streams based
on t3 & t2 with slight re-keying)

Until now I have created 2 joins, and I know joins exist in KafkaStreams,
but these joins I can understand with my mind why they would be
linearizable, since the messages on topics t1 & t2 will be sent to a common
pipe with a total order between messages keyed in any particular t1 key, so
the state processing would be linearizable at the level of each t1 entity
(including t1 messages & t2 messages).

Now the output topics are having the same key: t2 key (which is t1+t3 keys
combined).

Now I can join these topics, again sending them both to a 3rd topic with
this join via merge strategy which i understand will not lose combinations
because it cannot allow concurrency, so that all the messages keyed in a
specific t2 key (a specific t1+t3 key combination) are read in order and
applied single thread fashion, linearizable (kafka-streams-3).

So from these 2+1 pipelines I can have an output back to t6 where I rewrite
t6 records with records that contain a new value that is taken from t1.

Does this make sense?
Would you think that it is doable in less pipelines?
Would using joins instead of these merges allow any such guarantees of
single threaded processing across topics? I think not?
Thank you,

-- 

Dumitru-Nicolae Marasoui

Software Engineer



w kaluza.com 

LinkedIn  | Twitter


Kaluza Ltd. registered in England and Wales No. 08785057

VAT No. 100119879

Help save paper - do you need to print this email?


Re: 3 kafka-streams or a single kafka streams

2020-07-13 Thread Dumitru-Nicolae Marasoui
Hello kafka community,
Sorry, besides the 3 kafka streams that just merge topics into another
common topic, there must be consumers from those merged topics, that keep
local state, and that, once both t1 and t2 values for a particular t1 key
exist, emit the pair, and will keep emitting pairs multiple times for each
t1 key as long as it has new values in either t1 or t2, with the condition
that both t1 and t2 records are recorded in the local state for a
particular t1 key.

So these consumers would do the join itself with the help of a local
database.

I know it sounds a lot like RocksDb from kafka-streams but this is what I
can understand how it would work and prevent unsafe concurrency (e.g. race
conditions).

So each consumer consumes from a single topic with 2 event/message types
and as soon as it has a message of type t1 to which a local state t2 pair
exists (or the other way around), it emits (sends) a message to its output
(join) topic.

Can this system, of 3 merges (kafka streams) + 3 joins (consumers), which
takes care for things to be single threaded at the right shards/partitions,
and not to lose pairs or combinations due to incorrect concurrency, can
this be done in a simpler way?

Thanks,
Nicolae

On Mon, 13 Jul 2020 at 20:30, Dumitru-Nicolae Marasoui <
nicolae.maras...@kaluza.com> wrote:

>
> Hello kafka community,
> I imagine the following behavior that I could code in 3 kafka-streams
> pipelines and wondering if it can be done in fewer kafka streams with the
> same guarantees:
>
> I have 3 compacted topics, t1, t2 and t3, where t2 is the link (many-many)
> between t1 & t3.
> The same about t4-t6.
>
> I need to replicate t1-3 in t4-6 with a slightly different domain and move
> a column (avro attribute) from t1 to t6 to adapt the domains.
>
> In a multi-kafka-streams-pipelines paradigm I could do the following:
>
> I would create 2 or 3 intermediary topics:
> - a topic keyed in t1 key which gets events copies from both t1 & t2, both
> message types keyed in t1 key (kafka-streams-1 which merges 2 streams based
> on t1 & t2 with slight re-keying)
> - a topic keyed in t3 key which gets events copies from both t3 & t2, both
> message types keyed in t3 key (kafka-streams-2 which merges 2 streams based
> on t3 & t2 with slight re-keying)
>
> Until now I have created 2 joins, and I know joins exist in KafkaStreams,
> but these joins I can understand with my mind why they would be
> linearizable, since the messages on topics t1 & t2 will be sent to a common
> pipe with a total order between messages keyed in any particular t1 key, so
> the state processing would be linearizable at the level of each t1 entity
> (including t1 messages & t2 messages).
>
> Now the output topics are having the same key: t2 key (which is t1+t3 keys
> combined).
>
> Now I can join these topics, again sending them both to a 3rd topic with
> this join via merge strategy which i understand will not lose combinations
> because it cannot allow concurrency, so that all the messages keyed in a
> specific t2 key (a specific t1+t3 key combination) are read in order and
> applied single thread fashion, linearizable (kafka-streams-3).
>
> So from these 2+1 pipelines I can have an output back to t6 where I
> rewrite t6 records with records that contain a new value that is taken from
> t1.
>
> Does this make sense?
> Would you think that it is doable in less pipelines?
> Would using joins instead of these merges allow any such guarantees of
> single threaded processing across topics? I think not?
> Thank you,
>
> --
>
> Dumitru-Nicolae Marasoui
>
> Software Engineer
>
>
>
> w kaluza.com 
>
> LinkedIn  | Twitter
> 
>
> Kaluza Ltd. registered in England and Wales No. 08785057
>
> VAT No. 100119879
>
> Help save paper - do you need to print this email?
>


-- 

Dumitru-Nicolae Marasoui

Software Engineer



w kaluza.com 

LinkedIn  | Twitter


Kaluza Ltd. registered in England and Wales No. 08785057

VAT No. 100119879

Help save paper - do you need to print this email?


Re: 3 kafka-streams or a single kafka streams

2020-07-13 Thread Dumitru-Nicolae Marasoui
Hello kafka community,
Instead of the consumer, I can also have kafka-streams for stateful
transformation (aggregation / fold) of both types of events & detection
when both sides are present to emit a pair.
In fact I thought a bit more and in my particular case, if we will agree
that I do not need to rewrite t6 which is concurrency dangerous, but we
could create a t7 topic just with the association info i need to emit, and
downstreams consumers could join (and hopefully in a concurrency-safe way &
eventually consistent way), then i can do all with 2 kafka stream: a merge
and an aggregation, potentially skipping the topic in the middle - but i am
not sure about the ordering, i think i will have the topic in between kafka
streams instead of just one kafka streams, to make sure about the ordering
and single threaded processing at the right granularity,
Thank you,

On Mon, 13 Jul 2020 at 20:38, Dumitru-Nicolae Marasoui <
nicolae.maras...@kaluza.com> wrote:

> Hello kafka community,
> Sorry, besides the 3 kafka streams that just merge topics into another
> common topic, there must be consumers from those merged topics, that keep
> local state, and that, once both t1 and t2 values for a particular t1 key
> exist, emit the pair, and will keep emitting pairs multiple times for each
> t1 key as long as it has new values in either t1 or t2, with the condition
> that both t1 and t2 records are recorded in the local state for a
> particular t1 key.
>
> So these consumers would do the join itself with the help of a local
> database.
>
> I know it sounds a lot like RocksDb from kafka-streams but this is what I
> can understand how it would work and prevent unsafe concurrency (e.g. race
> conditions).
>
> So each consumer consumes from a single topic with 2 event/message types
> and as soon as it has a message of type t1 to which a local state t2 pair
> exists (or the other way around), it emits (sends) a message to its output
> (join) topic.
>
> Can this system, of 3 merges (kafka streams) + 3 joins (consumers), which
> takes care for things to be single threaded at the right shards/partitions,
> and not to lose pairs or combinations due to incorrect concurrency, can
> this be done in a simpler way?
>
> Thanks,
> Nicolae
>
> On Mon, 13 Jul 2020 at 20:30, Dumitru-Nicolae Marasoui <
> nicolae.maras...@kaluza.com> wrote:
>
>>
>> Hello kafka community,
>> I imagine the following behavior that I could code in 3 kafka-streams
>> pipelines and wondering if it can be done in fewer kafka streams with the
>> same guarantees:
>>
>> I have 3 compacted topics, t1, t2 and t3, where t2 is the link
>> (many-many) between t1 & t3.
>> The same about t4-t6.
>>
>> I need to replicate t1-3 in t4-6 with a slightly different domain and
>> move a column (avro attribute) from t1 to t6 to adapt the domains.
>>
>> In a multi-kafka-streams-pipelines paradigm I could do the following:
>>
>> I would create 2 or 3 intermediary topics:
>> - a topic keyed in t1 key which gets events copies from both t1 & t2,
>> both message types keyed in t1 key (kafka-streams-1 which merges 2 streams
>> based on t1 & t2 with slight re-keying)
>> - a topic keyed in t3 key which gets events copies from both t3 & t2,
>> both message types keyed in t3 key (kafka-streams-2 which merges 2 streams
>> based on t3 & t2 with slight re-keying)
>>
>> Until now I have created 2 joins, and I know joins exist in KafkaStreams,
>> but these joins I can understand with my mind why they would be
>> linearizable, since the messages on topics t1 & t2 will be sent to a common
>> pipe with a total order between messages keyed in any particular t1 key, so
>> the state processing would be linearizable at the level of each t1 entity
>> (including t1 messages & t2 messages).
>>
>> Now the output topics are having the same key: t2 key (which is t1+t3
>> keys combined).
>>
>> Now I can join these topics, again sending them both to a 3rd topic with
>> this join via merge strategy which i understand will not lose combinations
>> because it cannot allow concurrency, so that all the messages keyed in a
>> specific t2 key (a specific t1+t3 key combination) are read in order and
>> applied single thread fashion, linearizable (kafka-streams-3).
>>
>> So from these 2+1 pipelines I can have an output back to t6 where I
>> rewrite t6 records with records that contain a new value that is taken from
>> t1.
>>
>> Does this make sense?
>> Would you think that it is doable in less pipelines?
>> Would using joins instead of these merges allow any such guarantees of
>> single threaded processing across topics? I think not?
>> Thank you,
>>
>> --
>>
>> Dumitru-Nicolae Marasoui
>>
>> Software Engineer
>>
>>
>>
>> w kaluza.com 
>>
>> LinkedIn  | Twitter
>> 
>>
>> Kaluza Ltd. registered in England and Wales No. 08785057
>>
>> VAT No. 100119879
>>
>> Help save paper - do you need to print this email?
>>
>
>
> --
>
> Dumi

kafka-streams merge + aggregate vs merge + to topic + from topic + aggregate

2020-07-13 Thread Dumitru-Nicolae Marasoui
Hi,
I would like to understand the ordering guarantees if any at the merge
operator level. I think unless I am writing into a topic, there can be no
ordering guarantees, is that so?

I would normally do a key transformation, merge, write to an output topic,
and i know the partition (key) ordering guarantees, and have a second
kafka-streams from that topic to the aggregation and to the final topic.

Is there a way to make kafka-streams do this behind the scenes? I do not
see a way to guarantee the ordering outside of an intermediate topic,
implicit or explicit?
Thank you,

-- 

Dumitru-Nicolae Marasoui

Software Engineer



w kaluza.com 

LinkedIn  | Twitter


Kaluza Ltd. registered in England and Wales No. 08785057

VAT No. 100119879

Help save paper - do you need to print this email?