Re: Kafka Scaling Ideas

2020-12-21 Thread Haruki Okada
About load test: I think it'd be better to monitor per-message process latency and estimate required partition count based on it because it determines the max throughput per single partition. - Say you have to process 12 million messages/hour = messages/sec . - If you have 7 partitions (thus 7

Re: Kafka Scaling Ideas

2020-12-21 Thread Joris Peeters
Do you know why your consumers are so slow? 12E6msg/hour is msg/s, which is not very high from a Kafka point-of-view. As you're doing database inserts, I suspect that is where the bottleneck lies. If, for example, you're doing a single-row insert in a SQL DB for every message then this would i

Kafka in-sync replicas

2020-12-21 Thread Miroslav Tsvetanov
Hello everyone, I`m running Kafka with 3 brokers with replication factor 3 and in-sync replicas 2. If I set on producer side *acks=all* how many brokers should acknowledge the record? Thanks in advance. Best regards, Miroslav

Re: Kafka in-sync replicas

2020-12-21 Thread Tom Bentley
The leader should send the produce response (the acknowledgement) to the producer once the leader has persisted the batch to its log *and* the leader knows that one of the followers has persisted it to its log. On Mon, Dec 21, 2020 at 9:52 AM Miroslav Tsvetanov wrote: > Hello everyone, > > I`m r

Forwarding Kafka gc log to syslog servers

2020-12-21 Thread cool dharma06
Hi team, I am trying to configure Kafka servers to forward the GC (garbage collection) logs to Syslog servers. But i couldn't find an option in Sysloghandler. Also I feel I don't want to run any external Syslog service on the Kafka server and utilise Kafka Syslog handlers. When I went through th

[ANNOUNCE] Apache Kafka 2.7.0

2020-12-21 Thread Bill Bejeck
The Apache Kafka community is pleased to announce the release for Apache Kafka 2.7.0 * Configurable TCP connection timeout and improve the initial metadata fetch * Enforce broker-wide and per-listener connection creation rate (KIP-612, part 1) * Throttle Create Topic, Create Partition and Delete T

Re: Kafka Scaling Ideas

2020-12-21 Thread Yana K
Thanks Haruki and Joris. Haruki: Thanks for the detailed calculations. Really appreciate it. What tool/lib is used to load test kafka? So we've one consumer group and running 7 instances of the application - that should be good enough - correct? Joris: Great point. DB insert is a bottleneck (and

Re: Kafka Scaling Ideas

2020-12-21 Thread Joris Peeters
I'd probably just do it by experiment for your concrete data. Maybe generate a few million synthetic data rows, and for-each-batch insert them into a dev DB, with an outer grid search over various candidate batch sizes. You're looking to optimise for flat-out rows/s, so whichever batch size wins (

Re: Kafka Scaling Ideas

2020-12-21 Thread Yana K
Thanks! Also are there any producer optimizations anyone can think of in this scenario? On Mon, Dec 21, 2020 at 8:58 AM Joris Peeters wrote: > I'd probably just do it by experiment for your concrete data. > > Maybe generate a few million synthetic data rows, and for-each-batch insert > them i

--override option for bin/connect-distributed.sh

2020-12-21 Thread Aki Yoshida
Hi Kafka team, I think the --override option of Kafka is very practical in starting Kafka for various situations without changing the properties file. I missed this feature in Kafka-Connect and I wanted to have it, so I created a patch in this commit in my forked repo. https://github.com/elakito/ka

Re: In Memory State Store

2020-12-21 Thread John Roesler
Hi Navneeth, Yes, you are correct. I think there are some opportunities for improvement there, but there are also reasons for it to be serialized in the in-memory store. Off the top of my head, we need to serialize stored data anyway to send it to the changelog. Also, even though the store is

Re: kafka-streams: interaction between max.poll.records and window expiration ?

2020-12-21 Thread John Roesler
Hi Mathieu, I don’t think there would be any problem. Note that window expiry is computed against an internal clock called “stream time”, which is the max timestamp yet observed. This time is advanced per each record when that record is processed. There is a separate clock for each partition, s

Re: [ANNOUNCE] Apache Kafka 2.7.0

2020-12-21 Thread Gwen Shapira
woooh!!! Great job on the release Bill and everyone! On Mon, Dec 21, 2020 at 8:01 AM Bill Bejeck wrote: > > The Apache Kafka community is pleased to announce the release for Apache > Kafka 2.7.0 > > * Configurable TCP connection timeout and improve the initial metadata fetch > * Enforce brok

Re: [ANNOUNCE] Apache Kafka 2.7.0

2020-12-21 Thread Randall Hauch
Fantastic! Thanks for driving the release, Bill. Congratulations to the whole Kafka community. On Mon, Dec 21, 2020 at 5:55 PM Gwen Shapira wrote: > woooh!!! > > Great job on the release Bill and everyone! > > On Mon, Dec 21, 2020 at 8:01 AM Bill Bejeck wrote: > > > > The Apache Kafka comm

Re: Kafka Scaling Ideas

2020-12-21 Thread Haruki Okada
About "first layer" right? Then it's better to make sure that not get() the result of Producer#send() for each message, because in that way, it spoils the ability of producer-batching. Kafka producer batches messages by default and it's very efficient, so if you produce in async way, it rarely beco

Re: [ANNOUNCE] Apache Kafka 2.7.0

2020-12-21 Thread Guozhang Wang
Thank you Bill ! Congratulations to the community. On Mon, Dec 21, 2020 at 4:08 PM Randall Hauch wrote: > Fantastic! Thanks for driving the release, Bill. > > Congratulations to the whole Kafka community. > > On Mon, Dec 21, 2020 at 5:55 PM Gwen Shapira wrote: > > > woooh!!! > > > > Great

Re: Kafka Scaling Ideas

2020-12-21 Thread Yana K
I thought about it but then we don't have much time - will it optimize performance? On Mon, Dec 21, 2020 at 4:16 PM Haruki Okada wrote: > About "first layer" right? > Then it's better to make sure that not get() the result of Producer#send() > for each message, because in that way, it spoils the

Producer closed while allocating memory error

2020-12-21 Thread Dhirendra Singh
I am getting following error in kafka producer. org.apache.kafka.common.KafkaException: Producer closed while allocating memory at org.apache.kafka.clients.producer.internals.BufferPool.allocate(BufferPool.java:151) ~[kafka-clients-2.5.0.jar:?] at org.apache.kafka.clients.producer.internals.Reco