Kafka Consumer as a ReactiveStream

2016-07-27 Thread Joe San
Hi Kafka Users, 0down votefavorite I have the following logic in my mind and would like to know if this is a good approach to designing a scalable consumer. I have a topic that has say 10 partitions and I have a con

Re: Log retention not working

2016-07-27 Thread Manikumar Reddy
also check if any value set for log.retention.bytes broker config On Wed, Jul 27, 2016 at 8:03 PM, Samuel Taylor wrote: > Is it possible that your log directory is in /tmp/ and your OS is deleting > that directory? I know it's happened to me before. > > - Samuel > > On Jul 27, 2016 13:43, "David

Re: performance problem when reading lots of small files created by spark streaming.

2016-07-27 Thread Andy Davidson
Opps sorry , wrong mail list From: Andrew Davidson Date: Wednesday, July 27, 2016 at 7:10 PM To: "users@kafka.apache.org" Subject: performance problem when reading lots of small files created by spark streaming. > I have a relatively small data set however it is split into many small JSON >

performance problem when reading lots of small files created by spark streaming.

2016-07-27 Thread Andy Davidson
I have a relatively small data set however it is split into many small JSON files. Each file is between maybe 4K and 400K This is probably a very common issue for anyone using spark streaming. My streaming app works fine, how ever my batch application takes several hours to run. All I am doing is

RE: Problems with replication and performance

2016-07-27 Thread Krzysztof Nawara
I'd happily go the partition route, but I'm not really the one making the final decision. And currently the design leans toward the topic route, so I'd like to squeeze as much as possible out of it. I'm fairly certain I'd made some mistake in configuration/code, because just as you said (and th

Re: Problems with replication and performance

2016-07-27 Thread David Garcia
Sounds like you might want to go the partition route: http://www.confluent.io/blog/how-to-choose-the-number-of-topicspartitions-in-a-kafka-cluster/ If you lose a broker (and you went the topic route), the probability that an arbitrary topic was on the broker is higher than if you had gone the pa

Problems with replication and performance

2016-07-27 Thread Krzysztof Nawara
Hi! I've been testing Kafka. I've hit some problems, but I can't really understand what's going on, so I'd like to ask for your help. Situation - we want to decide whether to go for many topics/a couple of partitions or the other way around, so I'be trying to benchmark both cases. During tests

RE: Negative message latency

2016-07-27 Thread Tauzell, Dave
One test where I sent messages of size 3kb and was running the "kafka-console-consumer" was getting around 34ms end-to-end latency. When I sent larger messages, up to 1MB, I started to see higher latency but it was under 250ms. -Dave Dave Tauzell | Senior Software Engineer | Surescripts O: 6

RE: Negative message latency

2016-07-27 Thread Luo, Chao
Hi Dave, When I tested Kafka, I also observed negative latency sometimes. I guess it is because the slight clock difference between producers and consumers. Btw, what is your typical producer-to-consumer latency? My system has an average of 250 milliseconds. I cannot reduce it anymore. Best,

Negative message latency

2016-07-27 Thread Tauzell, Dave
I'm seeing this reported when I run lots of small messages (3K) through Kakfa: Negative message latency Could this be because the clocks on the producer/consumer servers are slightly off? -Dave Dave Tauzell | Senior Software Engineer | Surescripts O: 651.855.3042 | www.surescripts.com

Re: Log retention not working

2016-07-27 Thread Samuel Taylor
Is it possible that your log directory is in /tmp/ and your OS is deleting that directory? I know it's happened to me before. - Samuel On Jul 27, 2016 13:43, "David Yu" wrote: > We are using Kafka 0.8.2.0 provided by CDH. Our Kafka retention is set to > default 7 days. One problem we have with

Log retention not working

2016-07-27 Thread David Yu
We are using Kafka 0.8.2.0 provided by CDH. Our Kafka retention is set to default 7 days. One problem we have with one of our topics is that, the logs are purged within two days. This topic does not override any default settings. Just wanna get some pointers as to how to debug this issue. Thanks,

Re: Handling long commits during a rebalance

2016-07-27 Thread Joey Echeverria
Thanks Craig and Ewen! -Joey On Wed, Jul 27, 2016 at 2:38 AM, craig w wrote: > In my case, nowhen a rebalance occurs the work being performed can't be > "paused" and picked up again later, it has to be started again, so when > onPartitionsRevoked occurs the blocking queue is cleared...again

Re: Kafka Connect issues

2016-07-27 Thread Kristoffer Sjögren
The workers seems happier when reducing number of partitions for each worker. And when adding more topics they eventually die into a rebalancing state. May I ask what's a good configuration? At the moment we have... - 2 docker instances with 4 cores, 4 GB heap - each instance reads 4000 kB/s and

How many TCP connections Java producers opens to feed data to broker?

2016-07-27 Thread Vladimir Picka
Hello, does it use just one connection to one broker? One connection to each broker? Or opens multiple connection to one broker to achieve higher throughput? Many thanks, Petr

Embedded Kafka Broker Health Check

2016-07-27 Thread Enrico Olivelli - Diennea
Hi, I'm running Kafka launching KafkaServerStartable inside my JVM (I'm using version 0.10.0.0). I'm accessing the internal KafkaServer field using reflection Field serverField = kafkaServer.getClass().getDeclaredField("server"); serverField.setAccessible(true); KafkaServer server = (KafkaServer)

Re: Handling long commits during a rebalance

2016-07-27 Thread craig w
In my case, nowhen a rebalance occurs the work being performed can't be "paused" and picked up again later, it has to be started again, so when onPartitionsRevoked occurs the blocking queue is cleared...again may not ideal, but works for this use case. On Tue, Jul 26, 2016 at 8:53 PM, Joey Ech

Re: Handling long commits during a rebalance

2016-07-27 Thread Ewen Cheslack-Postava
Joey, You'll probably want to look into https://cwiki.apache.org/confluence/display/KAFKA/KIP-62%3A+Allow+consumer+to+send+heartbeats+from+a+background+thread That should address the long, minutes-long timeout you're referring to with onPartitionsRevoked(). If you need to address it in the meantim