Failure on timestamp extraction for kafka streams 0.10.2.0

2017-05-02 Thread Sachin Mittal
Hi, We are getting this exception right when we start the stream. org.apache.kafka.common.KafkaException: Unexpected error from SyncGroup: The timestamp of the message is out of acceptable range. at org.apache.kafka.clients.consumer.internals.AbstractCoordinator$SyncGroupResponseHandler.handle

Re: Kafka Streams 0.10.0.1 - multiple consumers not receiving messages

2017-05-02 Thread Henry Thacker
Thanks all for your replies - I have checked out the docs which were very helpful. I have now moved the separate topic streams to different processes each with their own app.id and I'm getting the following pattern, with no data consumed: "Starting stream thread [StreamThread-1] Discovered coordi

Re: Kafka Streams 0.10.0.1 - multiple consumers not receiving messages

2017-05-02 Thread Eno Thereska
Hi Henry, Could you share the streams configuration for your apps? I.e., the part where you assign application id and all the rest of the configs (just configs, not code). Thanks Eno > On May 2, 2017, at 8:53 AM, Henry Thacker wrote: > > Thanks all for your replies - I have checked out the do

Re: Failure on timestamp extraction for kafka streams 0.10.2.0

2017-05-02 Thread Eno Thereska
Hi Sachin, This should be fixed in 0.10.2.1, could you upgrade to that release? Here is JIRA: https://issues.apache.org/jira/browse/KAFKA-4861 . Thanks Eno > On May 2, 2017, at 8:43 AM, Sachin Mittal wrote: > > The timestamp of the message is

Re: Failure on timestamp extraction for kafka streams 0.10.2.0

2017-05-02 Thread Sachin Mittal
Thanks. Upgrade fixed it. On Tue, May 2, 2017 at 2:48 PM, Eno Thereska wrote: > Hi Sachin, > > This should be fixed in 0.10.2.1, could you upgrade to that release? Here > is JIRA: https://issues.apache.org/jira/browse/KAFKA-4861 < > https://issues.apache.org/jira/browse/KAFKA-4861>. > > Thanks >

Re: Kafka Streams 0.10.0.1 - multiple consumers not receiving messages

2017-05-02 Thread Henry Thacker
Hi Eno, At the moment this is hard coded, but overridable with command line parameters: config.put(StreamsConfig.APPLICATION_ID, appId + "-" + topic); config.put(StreamsConfig.BOOTSTRAP_SERVERS_CONFIG, brokers); config.put(StreamsConfig.ZOOKEEPER_CONNECT_CONFIG, zookeepers); config.put(StreamsCon

Re: Debugging Kafka Streams Windowing

2017-05-02 Thread Mahendra Kariya
Hi Matthias, What we did was read the data from sink topic and print it to console. And here's the raw data from that topic (the counts are randomized). As we can see, the data is certainly missing for some time windows. For instance, after 1493693760, the next timestamp for which the data is pres

Does queue.time still apply for the new Producer?

2017-05-02 Thread Petr Novak
The documentation reads as: "As events enter a queue, they are buffered in a queue, until either queue.time or batch.size is reached. A background thread (kafka.producer.async.ProducerSendThread) dequeues the batch of data and lets the kafka.producer.EventHandler serialize and send the data to the

Re: Debugging Kafka Streams Windowing

2017-05-02 Thread Garrett Barton
Mahendra, One possible thing I have seen that exhibits the same behavior of missing windows of data is the configuration of the topics (internal and your own) retention policies. I was loading data that was fairly old (weeks) and using event time semantics as the record timestamp (custom timesta

Setting up Kafka & Kafka Streams for loading real-time and 'older' data concurrently

2017-05-02 Thread Garrett Barton
Greetings all, I have a use case where I want to calculate some metrics against sensor data using event time semantics (record time is event time) that I already have. I have years of it, but for this POC I'd like to just load the last few months so that we can start deriving trend lines now vs

Re: Kafka Streams 0.10.0.1 - multiple consumers not receiving messages

2017-05-02 Thread Eno Thereska
Could you make sure you don’t have a firewall or that the Kafka brokers are set up correctly and can be accessed? Is the SSL port the same as the PLAINTEXT port in your server.config file? E.g., see this: https://stackoverflow.com/questions/43534220/marking-the-coordinator-dead-for-groupkafka/43

Re: Setting up Kafka & Kafka Streams for loading real-time and 'older' data concurrently

2017-05-02 Thread Damian Guy
Hi Garret, > I was running into data loss when segments are deleted faster than > downstream can process. My knee jerk reaction was to set the broker > configs log.retention.hours=2160 and log.segment.delete.delay.ms=2160 > and that made it go away, but I do not think this is right? > > I th

Re: Kafka Streams 0.10.0.1 - multiple consumers not receiving messages

2017-05-02 Thread Henry Thacker
Hi Eno, Think I've cracked it finally - was hit by two problems, firstly my listen IP was 0.0.0.0 and clients were trying to connect to this, which obviously wasn't going to work. Part two was I had stupidly left some code in when I was working with the KStreamBuilder and hadn't removed it when m

How to chain increasing window operations one after another

2017-05-02 Thread Garrett Barton
Lets say I want to sum values over increasing window sizes of 1,5,15,60 minutes. Right now I have them running in parallel, meaning if I am producing 1k/sec records I am consuming 4k/sec to feed each calculation. In reality I am calculating far more than sum, and in this pattern I'm looking at som

Re: Setting up Kafka & Kafka Streams for loading real-time and 'older' data concurrently

2017-05-02 Thread Garrett Barton
Thanks Damian, Does setting log.retention.hours have anything to do with compacted topics? Meaning would a topic not compact now for 90 days? I am thinking all the internal topics that streams creates in the flow. Having recovery through 90 days of logs would take a good while I'd imagine. Than

Re: Setting up Kafka & Kafka Streams for loading real-time and 'older' data concurrently

2017-05-02 Thread Damian Guy
Hi Garret, No, log.retention.hours doesn't impact compacted topics. Thanks, Damian On Tue, 2 May 2017 at 18:06 Garrett Barton wrote: > Thanks Damian, > > Does setting log.retention.hours have anything to do with compacted > topics? Meaning would a topic not compact now for 90 days? I am think

Re: Kafka Stream stops polling new messages

2017-05-02 Thread Matthias J. Sax
Did you check the logs? Maybe you need to increase log level to DEBUG to get some more information. Did you double check committed offsets via bin/kafka-consumer-groups.sh? -Matthias On 4/28/17 9:22 AM, João Peixoto wrote: > My stream gets stale after a while and it simply does not receive any n

Upgrading from kafka_2.9.2-0.8.1.1 to kafka_2.11-0.10.0.0

2017-05-02 Thread Milind Vaidya
Hi We are required to upgrade this in out production system. We have observed some data loss on producer side and want to try out new producer with flush() api. The procedure look like following as per the documents 1. Upgrade the cluster (brokers) with rolling upgrade. 2. Upgrade clients. I

Re: session window bug not fixed in 0.10.2.1?

2017-05-02 Thread Eno Thereska
Hi Ara, The PR https://github.com/apache/kafka/pull/2645 has gone to both trunk and 0.10.2.1, I just checked. What error are you seeing, could you give us an update? Thanks Eno On Fri, Apr 28, 2017 at 7:10 PM, Ara Ebrahimi wrote: > Hi, > > I upgraded to 0.10.2.1 yesterday, enabled caching for

Re: session window bug not fixed in 0.10.2.1?

2017-05-02 Thread Ara Ebrahimi
No errors. But if I enable caching I see performance drop considerably. The workaround was to disable caching. The same thing is still true in 10.2.1. Ara. > On May 2, 2017, at 12:55 PM, Eno Thereska wrote: > > Hi Ara, > > The PR https://github.com/apache/kafka/pull/2645 has gone to both trunk

why is it called kafka?

2017-05-02 Thread The Real Plato
searching google and the docs have not revealed the answer -- -plato

Re: why is it called kafka?

2017-05-02 Thread Vahid S Hashemian
This might help: https://www.quora.com/How-did-Kafka-get-its-name --Vahid From: The Real Plato To: users@kafka.apache.org Date: 05/02/2017 07:20 PM Subject:why is it called kafka? searching google and the docs have not revealed the answer -- -plato

Re: Setting up Kafka & Kafka Streams for loading real-time and 'older' data concurrently

2017-05-02 Thread João Peixoto
Out of curiosity, would this mean that a state store for such a window could hold 90 days worth of data in memory? Or filesystem if we're talking about Rocksdb On Tue, May 2, 2017 at 10:08 AM Damian Guy wrote: > Hi Garret, > > No, log.retention.hours doesn't impact compacted topics. > > Thanks,

Re: Kafka Stream stops polling new messages

2017-05-02 Thread João Peixoto
I believe I found the root cause of my problem. I seem to have hit this RocksDB bug https://github.com/facebook/rocksdb/issues/1121 On my stream configuration I have a custom transformer used for deduplicating records, highly inspired in the EventDeduplicationLambdaIntegrationTest

Re: Debugging Kafka Streams Windowing

2017-05-02 Thread Mahendra Kariya
Hi Garrett, Thanks for these insights. But we are not consuming old data. We want the Streams app to run in near real time. And that is how it is actually running. The lag never increases beyond a certain limit. So I don't think that's an issue. The values of the configs that you are mentioning a