Re: Kafka Streams changelog topic has 5 times higher out-traffic than in-traffic

2019-10-03 Thread Peter Levart
Hi Hu, On 10/2/19 8:54 PM, Xiyuan Hu wrote: Hi All, I'm doing smoke testing with my Kafka Streams app(V2.1.0). I noticed that below behaviors: 1) Out throughput of changelog topic could go up to 70mb/s while the in-traffic is around 10mb/s. 2) When traffic is bumpy, either due to producer/consu

KSTREAM basics

2019-10-03 Thread David O'Connor
I apologise if this query is trivial, but if I set up a Kafka stream to read from a KTABLE, will the stream contain both of the following? (I) all rows in the table that match the selection criteria when the stream is created, (2) any new rows written to the table while the stream is running. Re

Re: KSTREAM basics

2019-10-03 Thread Boyang Chen
Hey David, if you are talking about calling toStream() on KTABLE, from the java doc: *Note that this is a logical operation and only changes the "interpretation" of the stream, i.e., each record of * this changelog stream is no longer treated as an updated record* So the answer is yes to bot

Re: poor producing performance with very low CPU utilization?

2019-10-03 Thread Eric Azama
Hi Eric, You've given hardware information about your brokers, but I don't think you've provided information about the machine your producer is running on. Have you verified that you're not reaching any caps on your producer's machine? I also think you might be hitting the limit of what a single

RE: poor producing performance with very low CPU utilization?

2019-10-03 Thread Eric Owhadi
Hi Eric, Thanks a lot for your answer. Please find inline responses: >>You've given hardware information about your brokers, but I don't think >>you've provided information about the machine your producer is running on. >>Have you verified that you're not reaching any caps on your producer's >>

How auto.offset.reset = latest works

2019-10-03 Thread Hrishikesh Mishra
Hi, I want to understand how does *auto.offset.reset = latest *work. When consumer first call poll() method, will it assign the current offsets to consumer for all partition (when single consumer is up in a consumer group)? How do I know all partitions are assigned to a consumer? Regards Hrishik

RE: poor producing performance with very low CPU utilization?

2019-10-03 Thread Eric Owhadi
There is a key piece of information that should be critical to guess where the problem is: When I change from ack = all to ack = 1, instead of increasing message/s, it actually devises it by half! As if the problem is about how fast I produce data (given when I use ack 1 I assume I block less

Re: poor producing performance with very low CPU utilization?

2019-10-03 Thread Alexandru Ionita
This might help. Try to replicate the configuration this guy is using for benchmarking kafka. https://engineering.linkedin.com/kafka/benchmarking-apache-kafka-2-million-writes-second-three-cheap-machines Am Do., 3. Okt. 2019 um 22:45 Uhr schrieb Eric Owhadi : > There is a key piece of informatio

Re: poor producing performance with very low CPU utilization?

2019-10-03 Thread Xiyuan Hu
Hi Eric, I had the similar issue on my application. One thing I noticed is, metric `kafka_consumer_io_wait_time_avg_seconds` is switching from 2 mins to 6 hours! Average is about 1-2 hours. This metric is a build-in micrometer metric. I'm still googling the source code of this metric to understand

RE: poor producing performance with very low CPU utilization?

2019-10-03 Thread Eric Owhadi
To test if my backend was good, I tried using the kafka-producer-perf-test. Boy that was fast! Instead of my lame 200 message per seconds, I am getting 20 000 message per seconds. 100X That is more in line with my expectations. Granted that this test does not use my custom partitioner and s

Re: How auto.offset.reset = latest works

2019-10-03 Thread M. Manna
If I get your question right, your concern isn't about auto.offset.reset - it's the partition assignment. consumer group represents parallelism. It's well-documented in Kafka official docs. Each consumer (in a consumer group) gets fare share of job (i.e. # partitions for a topic subscription). Due

Kstreams and Schema registry question

2019-10-03 Thread KhajaAsmath Mohammed
Hi, I have kstream in our cluster. I am not sure how the schema is generated for this. Dpes kstream have schema in schema registry? How to move data from kstream to topic which has the same name? How to register schema automatically for this? This is how schema looks in our registry. My assumpti

Re: Kafka Streams changelog topic has 5 times higher out-traffic than in-traffic

2019-10-03 Thread Xiyuan Hu
Hi Peter, Thanks for the reply! I noticed that, after deployment, changelog topic has high bytes in/sec and messages/sec, but low bytes out/sec. Once the app is unstable, or traffic is bumpy, it switched: changelog topic has low bytes in/sec and messages/sec but high bytes out/sec. Is it normally?

Re: Kafka Streams changelog topic has 5 times higher out-traffic than in-traffic

2019-10-03 Thread Xiyuan Hu
Hi Peter, Thanks for the reply! I noticed that, after deployment, changelog topic has high bytes in/sec and messages/sec, but low bytes out/sec. Once the app is unstable, or traffic is bumpy, it switched: changelog topic has low bytes in/sec and messages/sec but high bytes out/sec. Is it normally?

RE: poor producing performance with very low CPU utilization?

2019-10-03 Thread Eric Owhadi
I found one big contributor to the badness was my custom partitioner had a bug (missing a Utils.ToPositive call). I also found the default partitioner use of murmur is very bad, compare to simply doing a hash, for a 5X perf degradation! As you can see bellow, using a good custom partitioner , I