Re: Debugging Kafka Streams Windowing

2017-04-27 Thread Mahendra Kariya
Oh good point! The reason why there is only one row corresponding to each time window is because it only contains the latest value for the time window. So what we did was we just dumped the data present in the sink topic to a db using an upsert query. The primary key of the table was time window.

Re: Debugging Kafka Streams Windowing

2017-04-27 Thread Matthias J. Sax
Thanks for the details (sorry that I forgot that you did share the output already). Might be a dumb question, but what is the count for missing windows in your seconds implementation? If there is no data for a window, it should not emit a window with count zero, but nothing. Thus, looking at you

help on producing from local laptop to ec2 via jump

2017-04-27 Thread Robert Towne
Hi, all. I am trying to produce on my local dev box to an ec2 broker via an ec2 jump server. I have tried with the following topology below using port forwarding w/no success: *Setup* local workstation: local EC2 Jump Server:server1 EC2 Broker : serve

Re: Debugging Kafka Streams Windowing

2017-04-27 Thread Mahendra Kariya
> Can you somehow verify your output? Do you mean the Kafka streams output? In the Kafka Streams output, we do see some missing values. I have attached the Kafka Streams output (for a few hours) in the very first email of this thread for reference. Let me also summarise what we have done so far.

Re: Debugging Kafka Streams Windowing

2017-04-27 Thread Matthias J. Sax
Thanks for reporting back! As Eno mentioned, we do have a JIRA for the same report as yours: https://issues.apache.org/jira/browse/KAFKA-5055 We are investigating... Can you somehow verify your output? Do you see missing values (not sure how easy it is to verify -- we are not sure yet if the repo

Re: Debugging Kafka Streams Windowing

2017-04-27 Thread Mahendra Kariya
Hi Matthias, We changed our timestamp extractor code to this. public long extract(ConsumerRecord record, long previousTimestamp) { Message message = (Message) record.value(); long timeInMillis = Timestamps.toMillis(message.getEventTimestamp()); if (timeInMillis < 0) { LOGGER.

Re: Caching in Kafka Streams to ignore garbage message

2017-04-27 Thread Matthias J. Sax
>> I'd like to avoid repeated trips to the db, and caching a large amount of >> data in memory. Lookups to the DB would be hard to get done anyway. Ie, it would not perform well, as all your calls would need to be synchronous... >> Is it possible to send a message w/ the id as the partition key

Re: Caching in Kafka Streams to ignore garbage message

2017-04-27 Thread Ali Akhtar
I'd like to avoid repeated trips to the db, and caching a large amount of data in memory. Is it possible to send a message w/ the id as the partition key to a topic, and then use the same id as the key, so the same node which will receive the data for an id is the one which will process it? On F

Re: Caching in Kafka Streams to ignore garbage message

2017-04-27 Thread Matthias J. Sax
The recommended solution would be to use Kafka Connect to load you DB data into a Kafka topic. With Kafka Streams you read your db-topic as KTable and do a (inne) KStream-KTable join to lookup the IDs. -Matthias On 4/27/17 2:22 PM, Ali Akhtar wrote: > I have a Kafka topic which will receive a l

Re: Deployment of Kafka Stream app

2017-04-27 Thread Matthias J. Sax
If you use a containerized environment, if a container fails it will get restarted -- so you get HA out of the box. For scaling, you have a single docker image, but start the same image multiple times. If you have multiple apps, you should have one image per app and start the number of containers

Caching in Kafka Streams to ignore garbage message

2017-04-27 Thread Ali Akhtar
I have a Kafka topic which will receive a large amount of data. This data has an 'id' field. I need to look up the id in an external db, see if we are tracking that id, and if yes, we process that message, if not, we ignore it. 99% of the data will be for ids which are not being tracked - 1% or s

Re: Debugging Kafka Streams Windowing

2017-04-27 Thread Matthias J. Sax
Streams skips records with timestamp -1 The metric you mentioned, reports the number of skipped record. Are you sure that `getEventTimestamp()` never returns -1 ? -Matthias On 4/27/17 10:33 AM, Mahendra Kariya wrote: > Hey Eno, > > We are using a custom TimeStampExtractor class. All messages

Deployment of Kafka Stream app

2017-04-27 Thread Mina Aslani
I understand that for dev env creating containers might not be needed, and as you said "start up an application and go"! However, I would like to know to have HA in my env and make it scalable; what is the proper setup I need to have. - Would every Kafka streaming job/app require a new docker imag

[ANNOUCE] Apache Kafka 0.10.2.1 Released

2017-04-27 Thread Gwen Shapira
The Apache Kafka community is pleased to announce the release for Apache Kafka 0.10.2.1. This is a bug fix release that fixes 29 issues in 0.10.2.0. All of the changes in this release can be found in the release notes: *https://archive.apache.org/dist/kafka/0.10.2.1/RELEASE_NOTES.html

Controller connection failures

2017-04-27 Thread Chuck Musser
I'm running into a problem with a 3 broker cluster where, intermittently, one of the broker's controller begins to report that it cannot connect to the other brokers and repeatedly logs the failure. Each broker is running in its own Docker container on separate machines. These Docker containers h

Re: Debugging Kafka Streams Windowing

2017-04-27 Thread Mahendra Kariya
Hey Eno, We are using a custom TimeStampExtractor class. All messages that we have in Kafka has a timestamp field. That is what we are using. The code looks like this. public long extract(ConsumerRecord record, long previousTimestamp) { Message message = (Message) record.value(); return T

Re: KafkaStreams pause specific topic partition consumption

2017-04-27 Thread Matthias J. Sax
Timur, there is not API to pause/resume partitions in Streams, because Streams handles/manages its internal consumer by itself. The "batch processing KIP" is currently delayed -- but I am sure we will pick it up again. Hopefully after 0.11 got released. > So we are considering to just pause spec

Re: Debugging Kafka Streams Windowing

2017-04-27 Thread Eno Thereska
Hi Mahendra, We are currently looking at the skipped-records-rate metric as part of https://issues.apache.org/jira/browse/KAFKA-5055 . Could you let us know if you use any special TimeStampExtractor class, or if it is the default? Thanks Eno >

Re: How to implement use case

2017-04-27 Thread Steven Schlansker
> On Apr 27, 2017, at 3:25 AM, Vladimir Lalovic wrote: > > Hi all, > > > > Our system is about ride reservations and acts as broker between customers > and drivers. > ... > Most of our rules are function of time and some reservation’s property > (e.g. check if there are any reservations whe

Re: Kafka Stream vs Spark

2017-04-27 Thread David Garcia
Unlike spark, you don’t need an entire framework to deploy your job. With Kstreams, you just start up an application and go. You don’t need docker either…although containerizing your stuff is probably a good strategy for the purposes of deployment management (something you get with Yarn or a s

Kafka Stream vs Spark

2017-04-27 Thread Mina Aslani
Hi, I created a kafka stream app and as I was informed I created a docker image with the app and launched it as a container. However, I have couple of questions: - Would every Kafka streaming job require a new docker image and deployment of the container/service? - How should I structure things d

KafkaStreams pause specific topic partition consumption

2017-04-27 Thread Timur Yusupov
I see it is possible to pause specific topic partition consumption when using KafkaConsumer directly, but looks like it is not possible when using KafkaStreams. There are following use cases for that: 1) Doing batch processing using Kafka Streams (I found https://cwiki.apache.org/confluence/displa

How to implement use case

2017-04-27 Thread Vladimir Lalovic
Hi all, Our system is about ride reservations and acts as broker between customers and drivers. Something similar what Uber does with major differences that we are mostly focused on reservation scheduled in advance. So between moment when reservation is created and until reservation/ride is ac

Debugging Kafka Streams Windowing

2017-04-27 Thread Mahendra Kariya
Hey All, We have a Kafka Streams application which ingests from a topic to which more than 15K messages are generated per second. The app filters a few of them, counts the number of unique filtered messages (based on one particular field) within a 1 min time window, and dumps it back to Kafka. Th