Get last N events from Kafka topic

2017-05-08 Thread Buntu Dev
I got a Kafka topic with 10 partitions and looking for ways to retrieve the last N, say 100K events using the system tools in 0.10.2 from across the partitions. Thanks!

Re: How to chain increasing window operations one after another

2017-05-08 Thread Michal Borowiecki
Apologies, In the code snippet of course only oneMinuteWindowed KTable will have a Windowed key (KTable, Value>), all others would be just KTable, Value>. Michał On 07/05/17 16:09, Michal Borowiecki wrote: Hi Garrett, I've encountered a similar challenge in a project I'm working on (it's

In kafka(0.10.1.0) with authentication using SASL/PLAIN, enable.auto.commit=false doesn't work

2017-05-08 Thread 齐雪婷
hi, I am an employee in Sina, I am from China Recentlly, I was using kafka, version is 0.10.1.0, which is authenticated wihe SASL/PLAIN, but when I am trying to control offset manually, problem appears. When I turn off SASL/PLAIN, enable.auto.commit=false does work, but when I turn on SASL/PLAIN

Re: Large Kafka Streams deployment takes a long time to bootstrap

2017-05-08 Thread João Peixoto
Thanks for the feedback. Here is additional information: * The stream instances are deployed on kubernetes through deployments. I do not know if they use emptyDir, hostPath or EBS * The instances have 2 cores minimum Good advice on the state stores, I already had some of those configurations, but

Re: Shouldn't the initializer of a stream aggregate accept the key?

2017-05-08 Thread João Peixoto
Thanks for the feedback. In the case of an aggregator I think it is simpler since we already have access to the key in the "Aggregator" implementation, meaning that we can already do something wrong with the current API if we're not paying attention. At any rate I'll wait for new developments. O

Kafka log.message.format.version and consumer client versions

2017-05-08 Thread Dave Hamilton
Hi, I have a question about the performance implications of upgrading the Kafka message format relating to the following from the upgrade documentation: The message format in 0.10.0 includes a new timestamp field and uses relative offsets for compressed messages. The on disk message format can b

Re: Kafka Streams - Handling HARD Deletes From RDBMS

2017-05-08 Thread Guozhang Wang
Hello, For change capture streams from JDBC that may include deletes, a common suggestion is to represent them as KTables (i.e. a changelog stream) instead of KStreams, since the former stream will override the values with the same key with the newer record in the stream. Then when you are doing t

Streams Processing Meetup @LinkedIn

2017-05-08 Thread Becket Qin
Hello! LinkedIn will be hosting a Streams Processing meetup on Wednesday May 24, 6-9PM in our Sunnyvale HQ. We'll have 3 exciting talks planned: - Streaming Data Pipelines with Brooklin - Kafka at Half the Price

[DISCUSS] KIP-156 Add option "dry run" to Streams application reset tool

2017-05-08 Thread BigData dev
Hi All, I want to start a discussion on this simple KIP for Kafka Streams reset tool (kafka-streams-application-reset.sh). https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=69410150 Thank you, Matthias J Sax for providing me Jira and info to work on. Thanks, Bharat

Re: [DISCUSS] KIP-156 Add option "dry run" to Streams application reset tool

2017-05-08 Thread Steven Schlansker
> On May 8, 2017, at 11:14 AM, BigData dev wrote: > > Hi All, > I want to start a discussion on this simple KIP for Kafka Streams reset > tool (kafka-streams-application-reset.sh). > https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=69410150 I've not used this tool, but if I were

Re: Kafka Streams stopped with errors, failed to reinitialize itself

2017-05-08 Thread Guozhang Wang
Hi Sameer, I looked at the logs, and there is only one suspicious entry: ``` 2017-05-03 14:26:54 WARN StreamThread:1184 - Could not create task 0_21. Will retry. org.apache.kafka.streams.errors.LockException: task [0_21] Failed to lock the state directory: /data/streampoc/LIC2-4/0_21 ``` It rep

Re: KafkaStreams pause specific topic partition consumption

2017-05-08 Thread Timur Yusupov
Matthias, Thanks for your answers. >> So we are considering to just pause specific >> topic partitions as soon as we arrive to stop offsets for them. >I am just wondering how you would do this in a fault-tolerant way (if you would have pause API)? On start of batch cycle we have to store somewher

Re: [DISCUSS] KIP-156 Add option "dry run" to Streams application reset tool

2017-05-08 Thread Matthias J. Sax
Thanks for the KIP Bharat! I like it. Don't have anything to add so far. Seems to be fairly straight forward. Looking forward for further comments from the community. -Matthias On 5/8/17 11:17 AM, Steven Schlansker wrote: > >> On May 8, 2017, at 11:14 AM, BigData dev wrote: >> >> Hi All, >>

Re: How to chain increasing window operations one after another

2017-05-08 Thread Matthias J. Sax
Michal, that's an interesting idea. In an ideal world, Kafka Streams should have an optimizer that is able to to this automatically under the hood. Too bad we are not there yet. @Garret: did you try this out? This seems to be a question that might affect many users, and it might we worth to docu

Re: In kafka(0.10.1.0) with authentication using SASL/PLAIN, enable.auto.commit=false doesn't work

2017-05-08 Thread Guozhang Wang
Hello Xueting, This issue is not expected in 0.10.1.0, i.e. the config should not be affected by the security feature like SASL at all. Could you describe your code snippet how did you do the manual offset commit, and how did you determine that the auto commit is still taking effect? Guozhang

Re: Kafka Stream stops polling new messages

2017-05-08 Thread Matthias J. Sax
Hey, I am not against opening a JIRA, but I am wondering what we should describe/report there. If I understand the scenario correctly, João uses a custom RocksDB store and calls seek() in user code land. As it is a bug in RocksDB that seek takes so long, I am not sure what we could improve within

Re: How to chain increasing window operations one after another

2017-05-08 Thread Garrett Barton
Michael, This is slick! I am still writing unit tests to verify it. My code looks something like: KTable, CountSumMinMaxAvgObj> oneMinuteWindowed = srcStream// my val object isnt really called that, just wanted to show a sample set of calculations the value can do! .groupByKey(Serdes.S

Re: Deduplicating KStream-KStream join

2017-05-08 Thread Matthias J. Sax
Offir, Yes, KTable would not be 100% reliable: > The simplest way to work around this within Streams would be to use a > KTable -- but it might not be 100% guaranteed that all duplicates get > filtered (as we flush on commit) Your suggestion (1) to add a TTL is unfortunately not easily to implem

Re: Debugging Kafka Streams Windowing

2017-05-08 Thread Matthias J. Sax
Great! Glad 0.10.2.1 fixes it for you! -Matthias On 5/7/17 8:57 PM, Mahendra Kariya wrote: > Upgrading to 0.10.2.1 seems to have fixed the issue. > > Until now, we were looking at random 1 hour data to analyse the issue. Over > the weekend, we have written a simple test that will continuously ch

Re: Large Kafka Streams deployment takes a long time to bootstrap

2017-05-08 Thread Guozhang Wang
Hello, Just to adds a few more pointers that there is a few improvements we have added in trunk and are considering to also piggy-back to a 0.10.2 in case we can have a 0.10.2.2 release, and one of them that would help with this case: https://cwiki.apache.org/confluence/display/KAFKA/KIP-134%3A+D

Re: KafkaStreams pause specific topic partition consumption

2017-05-08 Thread Matthias J. Sax
I see. I you do the step of storing the end offsets in your database before starting up Streams this would work. What you could do as a work around (even if it might not be a nice solution), is to apply a `transform()` as your first operator. Within `transfrom()` you get access to there current r

Re: How to chain increasing window operations one after another

2017-05-08 Thread Matthias J. Sax
Thinking about this once more (and also having a fresh memory of another thread about KTables), I am wondering if this approach needs some extra tuning: As the result of the first window aggregation produces an output stream with unbounded key space, the following (non-windowed) KTables would grow

Re: KafkaStreams pause specific topic partition consumption

2017-05-08 Thread Timur Yusupov
That means in order to process filtered out records in a next batch, we have to seek KafkaStreams back, right? On Tue, May 9, 2017 at 1:19 AM, Matthias J. Sax wrote: > I see. > > I you do the step of storing the end offsets in your database before > starting up Streams this would work. > > What

Re: KafkaStreams pause specific topic partition consumption

2017-05-08 Thread Matthias J. Sax
Yes. That is something you would need to do external too. There is a KIP for a tool (https://cwiki.apache.org/confluence/display/KAFKA/KIP-122%3A+Add+Reset+Consumer+Group+Offsets+tooling) -- but you can also do this using a single `KafkaConsumer` with `group.id == application.id` that gets all par

Re: KafkaStreams pause specific topic partition consumption

2017-05-08 Thread Timur Yusupov
Got it, thanks, Matthias! On Tue, May 9, 2017 at 2:07 AM, Matthias J. Sax wrote: > Yes. That is something you would need to do external too. > > There is a KIP for a tool > (https://cwiki.apache.org/confluence/display/KAFKA/KIP- > 122%3A+Add+Reset+Consumer+Group+Offsets+tooling) > -- but you can

Re: session window bug not fixed in 0.10.2.1?

2017-05-08 Thread Ara Ebrahimi
Well, I can tell you this: Performance is good initially but then it degrades massively after processing a few billion records in a few hours. It becomes quite “choppy”. A little bit of data is processed, then absolute silence, then another small chunk processed, and so on. I did enable rocksdb

producer and consumer sample code

2017-05-08 Thread Adaryl Wakefield
Does anybody have a really good example of some producers and consumers that they have written that they would be willing to share? Adaryl "Bob" Wakefield, MBA Principal Mass Street Analytics, LLC 913.938.6685 www.massstreet.net www.linkedin.com/in/bobwakefieldmba Twitter: @BobLovesData

Re: producer and consumer sample code

2017-05-08 Thread BigData dev
Hi Adaryl, There are samples of java producer/consumer in Apache Kafka source code repo. https://github.com/apache/kafka/tree/trunk/examples/src/main/java/kafka/examples And also there is an IBM Hadoop Dev Blog to get started with Kafka. The code sample is attached to the blog. https://develope

Re: Kafka log.message.format.version and consumer client versions

2017-05-08 Thread Manikumar
yes, It is sufficient to upgrade all consumers to new version. No need to switch Scala APIs to Java APIs. On Mon, May 8, 2017 at 10:03 PM, Dave Hamilton wrote: > Hi, I have a question about the performance implications of upgrading the > Kafka message format relating to the following from the up

Error: Executing consumer group command failed due to Request GROUP_COORDINATOR failed on brokers List(localhost:9092 (id: -1 rack: null))

2017-05-08 Thread Abhimanyu Nagrath
I am using single node Kafka 0.10.2 and while running this command *./kafka-consumer-groups.sh --bootstrap-server localhost:9092 --describe --group * on my Kafka broker server 80 % time its throwing this error. Error: Executing consumer group command failed due to Request GROUP_COORDINATOR failed o

WARN Attempting to send response via channel for which there is no open connection, connection id 0 (kafka.network.Processor)

2017-05-08 Thread Abhimanyu Nagrath
Hi, I am using single node Kafka 0.10.2 . and my server.log file is full with this warning *WARN Attempting to send response via channel for which there is no open connection, connection id 0 (Kafka.network.Processor)* . Can anyone suggest how to fix this warning. Regards, Abhimanyu

Kafka 0.9.0.1 Direct Memory OOM

2017-05-08 Thread JsonTu
Hi All, We have a cluster with 6 nodes. we have meet a direct buffer memory OOM in our prod enviroment. Default config of Kafka’s JVM is used in our cluster. some error is like below, java.lang.OutOfMemoryError: Direct buffer memory at java.nio.Bits.reserveMemory(Bits.java:658)