Re: Question: Kafka as a message queue for long running tasks

2019-08-26 Thread Raman Gupta
Gerard, we have a similar use case we are using Kafka for, and are setting max.poll.interval.ms to a large value in order to handle the worst-case scenario. Rebalancing is indeed a big problem with this approach (and not just for "new" consumers as you mentioned -- adding consumers causes a stop-t

Streams odd timeout error at startup

2019-08-16 Thread Raman Gupta
I'm experiencing an error with a particular stream, which is consistently giving the following error at start-up time : ``` 2019-08-14 17:02:29,568 ERROR --- [2ce1b-StreamThread-2] org.apa.kaf.str.pro.int.StreamTask: task [0_36] Timeout exception caught when initializing transactio

Re: Kafka Streams close timeout calculation

2019-08-16 Thread Raman Gupta
Why not just set it to a really big number, try closing about 10 times with a stopwatch, and see what your actual close times distribution looks like? On Fri, Aug 16, 2019 at 6:19 AM Simitchiyski, Kaloyan wrote: > > Hello, > > Is there a good way to calculate an efficient timeout for the > Kafka

Re: Rebalancing algorithm is extremely suboptimal for long processing

2019-07-25 Thread Raman Gupta
handling this. > > > > > > I also support several services where average message processing time > > takes > > > 20 seconds per message but p99 time is about 20 minutes and the > > > stop-the-world rebalancing is very painful > > > > > >

Re: Rebalancing algorithm is extremely suboptimal for long processing

2019-07-19 Thread Raman Gupta
Jul 19, 2019 at 12:53 PM Raman Gupta wrote: > > I have a situation in which the current rebalancing algorithm seems to > be extremely sub-optimal. > > I have a topic with 100 partitions, and up to 100 separate consumers. > Processing each message on this topic takes between 1

Rebalancing algorithm is extremely suboptimal for long processing

2019-07-19 Thread Raman Gupta
I have a situation in which the current rebalancing algorithm seems to be extremely sub-optimal. I have a topic with 100 partitions, and up to 100 separate consumers. Processing each message on this topic takes between 1 and 20 minutes, depending on the message. If any of the 100 consumers dies o

Topic migration with Kafka exactly once streams

2019-07-12 Thread Raman Gupta
I have migrated topic-v1 with 10 partitions to topic-v2, with 100 partitions. I have a stateless exactly-once kafka stream currently reading from topic-v1, and wish to update the stream to read from topic-v2 instead. Given the different number of partitions, is there any way to do this at the Kaf

Re: Streams bug with auto.offset.reset = none?

2019-07-10 Thread Raman Gupta
Just to close the loop on this for the mailing list: after discussion with Matthias Sax on Slack, I created this issue: https://issues.apache.org/jira/browse/KAFKA-8650. Regards, Raman On Tue, Jul 9, 2019 at 12:43 PM Raman Gupta wrote: > > I have a stream that is configured for exactl

Streams bug with auto.offset.reset = none?

2019-07-09 Thread Raman Gupta
I have a stream that is configured for exactly-once processing. I set the "auto.offset.reset" of this stream to "none" for two reasons: 1) If an exactly-once stream loses its offset for some reason, using either of earliest or latest is dangerous: "earliest" implies reprocessing data, and "latest"

Streams configuration for a stream with long varied processing times

2019-05-22 Thread Raman Gupta
I have a situation in which I have a stream that does a transformation. This transformation can take as little as 10-30s or up to about 15 minutes to run. The stream works just fine, until a rebalance happens in the middle of long processing. Rebalancing sometimes takes a long time, and sometimes,

Re: Event Sourcing question

2019-05-08 Thread Raman Gupta
If ordering of these events is important, then putting them in the same topic is not only desired, it's necessary. See https://www.confluent.io/blog/put-several-event-types-kafka-topic/. However, think hard about whether ordering is actually important in your use case or not, as things are certainl

Re: Race condition with stream use of Global KTable

2019-04-12 Thread Raman Gupta
topology.addSinkNode(... StreamPartitioner) for > Processor users, to override the default partitioning key from the message > key to scheme that is consistent with the key of topic-1. Then we can > guarantee they are co-partitioned. > > Guozhang > > On Tue, Apr 9, 2019 at 7:59

Re: Race condition with stream use of Global KTable

2019-04-08 Thread Raman Gupta
topic-2 does not need to rely on it being partitioned by the > key to perform other operations, it is okay), and then 3) we can just issue > `get` as part of the lower-level processor API rather than performing a > join to query the materialized table from topic-1. Does that make sen

Re: Race condition with stream use of Global KTable

2019-04-04 Thread Raman Gupta
ormation result of topic-1 via "other stream")? > > > > Guozhang > > On Tue, Apr 2, 2019 at 4:07 PM Raman Gupta wrote: > > > Yes, I forgot to show an item on the topology: > > > >+---> global-ktable +-+ > >

Re: Kafka Broker Config (logs.dir) on Kubernetes

2019-04-04 Thread Raman Gupta
Note you shouldn't cross-post to both the users and dev list -- this kind of question belongs on the user list. The fundamental things you need to go investigate: * Kubernetes Stateful Sets * Kafka packaged for use on Kubernetes -- I have been happy with https://github.com/Yolean/kubernetes-kafka.

Re: Race condition with stream use of Global KTable

2019-04-02 Thread Raman Gupta
e a bit more on why do you want to use the same source topic for two > entities in your topology? > > > Guozhang > > On Tue, Apr 2, 2019 at 3:41 PM Raman Gupta wrote: > > > I have a topology like this: > > > >+---> global-ktable +--

Race condition with stream use of Global KTable

2019-04-02 Thread Raman Gupta
I have a topology like this: +---> global-ktable +-+ | | + v topic-1stream + ^ | | +

Re: Streams reset offsets for no apparent reason

2019-02-22 Thread Raman Gupta
this. Thanks for your help Matthias. Regards, Raman On Fri, Feb 22, 2019 at 11:13 AM Raman Gupta wrote: > Hmm, it turns out I was mistaken -- we are running Kafka 2.0.0, not 2.1.0. > We are on 2.1.0 on the client side, not the server. > > Reading through KAFKA-4682, a

Re: Streams reset offsets for no apparent reason

2019-02-22 Thread Raman Gupta
ed > as long as the consumer group is online. The `offset.retention.minutes` > config starts to tick when the consumer groups goes offline (cf > https://issues.apache.org/jira/browse/KAFKA-4682) > > > -Matthias > > On 2/21/19 3:08 PM, Raman Gupta wrote: > > I am un

Re: Streams reset offsets for no apparent reason

2019-02-21 Thread Raman Gupta
ch the Java consumer Fetcher may print the log message "Resetting offset for partition {} to offset {}."? Regards, Raman On Thu, Feb 21, 2019 at 2:00 AM Matthias J. Sax wrote: > Thanks for reporting the issue! > > Are you able to reproduce it? If yes, can you maybe provide

Streams reset offsets for no apparent reason

2019-02-20 Thread Raman Gupta
I have an exactly-once stream that reads a topic, transforms it, and writes new messages into the same topic as well as other topics. I am using Kafka 2.1.0. The stream applications run in Kubernetes. I did a k8s deployment of the application with minor changes to the code -- absolutely no changes

Re: rebalancing latency spikes on high throughput kafka-streams services

2019-01-17 Thread Raman Gupta
The first thing I'd take a look at is your `max.poll.records` setting. The default for streams is 1000 (see https://docs.confluent.io/current/streams/developer-guide/config-streams.html#default-values). Depending on your workloads, this could definitely cause long rebalances -- it did for me, but m

Manage offset stored in kafka 0.8.2

2017-04-14 Thread Raman Gupta
Hi, I am using kafka 0.8.2 and storing offsets in kafka. I want to reset all partition offsets to earliest.I know one way to do it using java program. Is there any other way ? Please suggest. Regards, Raman Gupta