Re: Kafka docs for current trunk

2017-01-31 Thread Michael Noll
Thanks for bringing this up, Matthias. +1 On Wed, Feb 1, 2017 at 8:15 AM, Gwen Shapira wrote: > +1 > > On Tue, Jan 31, 2017 at 5:57 PM, Matthias J. Sax > wrote: > > Hi, > > > > I want to collect feedback about the idea to publish docs for current > > trunk version of Apache Kafka. > > > > Curr

Re: KIP-121 [Discuss]: Add KStream peek method

2017-02-07 Thread Michael Noll
Many thanks for the KIP and the PR, Steven! My opinion, too, is that we should consider including this. One thing that I would like to see clarified is the difference between the proposed peek() and existing functions map() and foreach(), for instance. My understanding (see also the Java 8 links

Re: Kafka streams: Getting a state store associated to a processor

2017-02-13 Thread Michael Noll
Adam, also a FYI: The upcoming 0.10.2 version of the Streams API will be backwards compatible with 0.10.1 clusters, so you can keep your brokers on 0.10.1.1 and still use the latest Streams API version (including the one from trunk, as Matthias mentioned). -Michael On Mon, Feb 13, 2017 at 1:04

Re: Kafka streams: Getting a state store associated to a processor

2017-02-14 Thread Michael Noll
> By the way - do I understand correctly that when a state store is persistent, it is logged by default? Yes. > So enableLogging(Map) only is a way to provide default configuration to the default logging? Yes. That is, any configs that should be applied to the state store's changelog topic. >

Re: KTable send old values API

2017-02-22 Thread Michael Noll
Dmitry, I think your use case is similar to the one I described in the link below (discussion in the kafka-dev mailing list): http://search-hadoop.com/m/uyzND1rVOQ12OJ84U&subj=Re+Streams+TTLCacheStore Could you take a quick look? -Michael On Wed, Feb 22, 2017 at 12:39 AM, Dmitry Minkovsky w

Re: Simple data-driven app design using Kafka

2017-02-23 Thread Michael Noll
Pete, have you looked at Kafka's Streams API yet? There are many examples available in the `kafka-streams` folder at https://github.com/confluentinc/examples. The simplest example of "Do sth to a new data record as soon as it arrives" might be the MapFunctionLambdaExample. You can create differ

Re: Kafka Streams vs Spark Streaming

2017-02-27 Thread Michael Noll
> Also, is it possible to stop the syncing between state stores to brokers, if I am fine with failures? Yes, you can disable the syncing (or the "changelog" feature) of state stores: http://docs.confluent.io/current/streams/developer-guide.html#enable-disable-state-store-changelogs > I do have a

Re: Kafka Streams vs Spark Streaming

2017-02-28 Thread Michael Noll
size 1 hour and retention of 3 hours. > > So to conclude if you can manage rocks db, then kafka streams is good to > start with, its simple and very intuitive to use. > > Again on rocksdb side, is there a way to eliminate that and is > > disableLogging > > for that? > &g

Re: Chatty StreamThread commit messages

2017-03-01 Thread Michael Noll
Good point, Steven. +1 here. On Wed, Mar 1, 2017 at 8:52 AM, Damian Guy wrote: > +1 > On Wed, 1 Mar 2017 at 07:15, Guozhang Wang wrote: > > > Hey Steven, > > > > That is a good question, and I think your proposal makes sense. Could you > > file a JIRA for this change to keep track of it? > > >

Re: Writing data from kafka-streams to remote database

2017-03-06 Thread Michael Noll
I'd use option 2 (Kafka Connect). Advantages of #2: - The code is decoupled from the processing code and easier to refactor in the future. (same as #4) - The runtime/uptime/scalability of your Kafka Streams app (processing) is decoupled from the runtime/uptime/scalability of the data ingestion in

Re: Kafka streams DSL advantage

2017-03-06 Thread Michael Noll
The DSL has some unique features that aren't in the Processor API, such as: - KStream and KTable abstractions. - Support for time windows (tumbling windows, hopping windows) and session windows. The Processor API only has stream-time based `punctuate()`. - Record caching, which is slightly better

Re: Can I create user defined stream processing function/api?

2017-03-07 Thread Michael Noll
There's also an end-to-end example for DSL and Processor API integration: https://github.com/confluentinc/examples/blob/3.2.x/kafka-streams/src/test/java/io/confluent/examples/streams/MixAndMatchLambdaIntegrationTest.java Best, Michael On Tue, Mar 7, 2017 at 4:51 PM, LongTian Wang wrote: > Re

Re: [DISCUSS] KIP-120: Cleanup Kafka Streams builder API

2017-03-09 Thread Michael Noll
Thanks for the update, Matthias. +1 to the points 1,2,3,4 you mentioned. Naming is always a tricky subject, but renaming KStreamBuilder to StreamsTopologyBuilder looks ok to me (I would have had a slight preference towards DslTopologyBuilder, but hey.) The most important aspect is, IMHO, what yo

Re: Kafka Streams question

2017-03-09 Thread Michael Noll
There's actually a demo application that demonstrates the simplest use case for Kafka's Streams API: to read data from an input topic and then write that data as-is to an output topic. https://github.com/confluentinc/examples/blob/3.2.x/kafka-streams/src/test/java/io/confluent/examples/streams/Pa

Re: Streams - Got bit by log levels of the record cache

2017-03-10 Thread Michael Noll
I think a related JIRA ticket is https://issues.apache.org/jira/browse/KAFKA-4829 (see Guozhang's comment about the ticket's scope). -Michael On Thu, Mar 9, 2017 at 6:22 PM, Damian Guy wrote: > Hi Nicolas, > > Please do file a JIRA. > > Many thanks, > Damian > > On Thu, 9 Mar 2017 at 15:54 Nic

Re: ~20gb of kafka-streams state, unexpected?

2017-03-10 Thread Michael Noll
In addition to what Eno already mentioned here's some quick feedback: - Only for reference, I'd add that 20GB of state is not necessarily "massive" in absolute terms. I have talked to users whose apps manage much more state than that (1-2 orders of magnitude more). Whether or not 20 GB is massiv

Re: Kafka Streams question

2017-03-14 Thread Michael Noll
application's processing needs. -Michael On Tue, Mar 14, 2017 at 9:00 AM, BYEONG-GI KIM wrote: > Dear Michael Noll, > > I have a question; Is it possible converting JSON format to YAML format via > using Kafka Streams? > > Best Regards > > KIM > > 2017-03-10 11:36 GMT+09

Re: [DISCUSS] KIP-120: Cleanup Kafka Streams builder API

2017-03-14 Thread Michael Noll
I see Jay's point, and I agree with much of it -- notably about being careful which concepts we do and do not expose, depending on which user group / user type is affected. That said, I'm not sure yet whether or not we should get rid of "Topology" (or a similar term) in the DSL. For what it's wor

Re: Trying to use Kafka Stream

2017-03-15 Thread Michael Noll
Mina, in your original question you wrote: > However, I do not see the word count when I try to run below example. Looks like that it does not connect to Kafka. The WordCount demo example writes its output to Kafka only -- it *does not* write any results to the console/STDOUT. >From what I can

Re: Trying to use Kafka Stream

2017-03-15 Thread Michael Noll
ob/3.2.x/kafka-streams/src/main/ > java/io/confluent/examples/streams/WordCountLambdaExample.java#L178-L181) > in my IDE was not and still is not working. > > Best regards, > Mina > > > On Wed, Mar 15, 2017 at 4:43 AM, Michael Noll > wrote: > > > Mina, > > > &g

Re: Not Serializable Result Error

2017-03-15 Thread Michael Noll
Hi Armaan, > org.apache.spark.SparkException: Job aborted due to stage failure: >Task 0.0 in stage 0.0 (TID 0) had a not serializable result: org.apache.kafka.clients.consumer.ConsumerRecord perhaps you should ask that question in the Spark mailing list, which should increase your chances of

Re: Out of order message processing with Kafka Streams

2017-03-20 Thread Michael Noll
And since you asked for a pointer, Ali: http://docs.confluent.io/current/streams/concepts.html#windowing On Mon, Mar 20, 2017 at 5:43 PM, Michael Noll wrote: > Late-arriving and out-of-order data is only treated specially for windowed > aggregations. > > For stateless operat

Re: Out of order message processing with Kafka Streams

2017-03-20 Thread Michael Noll
Late-arriving and out-of-order data is only treated specially for windowed aggregations. For stateless operations such as `KStream#foreach()` or `KStream#map()`, records are processed in the order they arrive (per partition). -Michael On Sat, Mar 18, 2017 at 10:47 PM, Ali Akhtar wrote: > >

Re: clearing an aggregation?

2017-03-20 Thread Michael Noll
Jon, the windowing operation of Kafka's Streams API (in its DSL) aligns time-based windows to the epoch [1]: Quoting from e.g. hopping windows (sometimes called sliding windows in other technologies): > Hopping time windows are aligned to the epoch, with the lower interval bound > being inclusiv

Re: Out of order message processing with Kafka Streams

2017-03-20 Thread Michael Noll
t possible to get metadata on the message, such as whether or not > its late, its index/position within the other messages, etc? > > On Mon, Mar 20, 2017 at 9:44 PM, Michael Noll > wrote: > > > And since you asked for a pointer, Ali: > > http://docs.confluent.io/current/st

Re: Out of order message processing with Kafka Streams

2017-03-20 Thread Michael Noll
ucket) should be started, and future messages should belong to that > 'session', until the next 30+ min gap). > > On Mon, Mar 20, 2017 at 11:44 PM, Michael Noll > wrote: > > > > Can windows only be used for aggregations, or can they also be used for > > fore

Re: Re: [DISCUSS] KIP-120: Cleanup Kafka Streams builder API

2017-03-20 Thread Michael Noll
d the topology if provided as a constructor argument. However, > >> especially for DSL (not sure if it would make sense for PAPI), the DSL > >> builder could create the client for the user. > >> > >> Something like this: > >> > >>> KStreamBuilder bu

Re: Using Kafka Stream in the cluster of kafka on one or multiple docker-machine/s

2017-03-21 Thread Michael Noll
Typically you'd containerize your app and then launch e.g. 10 containers if you need to run 10 instances of your app. Also, what do you mean by "in a cluster of Kafka containers" and "in the cluster of Kafkas"? On Tue, Mar 21, 2017 at 9:08 PM, Mina Aslani wrote: > Hi, > > I am trying to underst

Re: [DISCUSS] KIP-120: Cleanup Kafka Streams builder API

2017-03-22 Thread Michael Noll
Forwarding to kafka-user. -- Forwarded message -- From: Michael Noll Date: Wed, Mar 22, 2017 at 8:48 AM Subject: Re: [DISCUSS] KIP-120: Cleanup Kafka Streams builder API To: d...@kafka.apache.org Matthias, > @Michael: > > You seemed to agree with Jay about not exp

Re: Error in running PageViewTypedDemo

2017-03-22 Thread Michael Noll
IIRC the PageViewTypedDemo example requires input data where the username/userId is captured in the keys of messages/records, and further information in the values of those messages. The problem you are running into is that, when you are writing your input data via the console consumer, the record

Re: kafka streams in-memory Keyvalue store iterator remove broken on upgrade to 0.10.2.0 from 0.10.1.1

2017-03-22 Thread Michael Noll
To add to what Matthias said, in case the following isn't clear: - You should not (and, in 0.10.2, cannot any longer) call the iterator's remove() method, i.e. `KeyValueIterator#remove()` when iterating through a `KeyValueStore`. Perhaps this is something we should add to the `KeyValueIterator` j

Re: Error in running PageViewTypedDemo

2017-03-23 Thread Michael Noll
mThread.runLoop( > > StreamThread.java:415) > > at org.apache.kafka.streams.processor.internals. > > StreamThread.run(StreamThread.java:242) > > > > Any example on the correct input value is really appreciated. > > > > Thanks > > > > On Wed, Mar 2

Re: Getting current value of aggregated key

2017-03-23 Thread Michael Noll
Jon, you can use Kafka's interactive queries feature for this: http://docs.confluent.io/current/streams/developer-guide.html#interactive-queries -Michael On Thu, Mar 23, 2017 at 1:52 PM, Jon Yeargers wrote: > If I have an aggregation : > > KTable, VideoLogLine> outTable = > sourceStream.grou

Re: clearing an aggregation?

2017-03-23 Thread Michael Noll
his sort of 'aggregate and clear' approach still requires an > external datastore (like Redis). Please correct me if Im wrong. > > On Mon, Mar 20, 2017 at 9:29 AM, Michael Noll > wrote: > > > Jon, > > > > the windowing operation of Kafka's Streams API

Re: APPLICATION_SERVER_CONFIG ?

2017-03-24 Thread Michael Noll
> If I understand this correctly: assuming I have a simple aggregator > distributed across n-docker instances each instance will _also_ need to > support some sort of communications process for allowing access to its > statestore (last param from KStream.groupby.aggregate). Yes. See http://docs.c

Re: Streams RocksDBException with no message?

2017-03-27 Thread Michael Noll
We're talking about `ulimit` (CLI tool) and the `nofile` limit (number of open files), which you can access via `ulimit -n`. Examples: https://access.redhat.com/solutions/61334 https://stackoverflow.com/questions/21515463/how-to-increase-maximum-file-open-limit-ulimit-in-ubuntu Depending on the o

Re: YASSQ (yet another state store question)

2017-03-27 Thread Michael Noll
IIRC this may happen, for example, if the first host runs all the stream tasks (here: 2 in total) and migration of stream task(s) to the second host hasn't happened yet. -Michael On Sun, Mar 26, 2017 at 3:14 PM, Jon Yeargers wrote: > Also - if I run this on two hosts - what does it imply if t

Re: using a state store for deduplication

2017-03-27 Thread Michael Noll
Jon, Damian already answered your direct question, so my comment is a FYI: There's a demo example at https://github.com/confluentinc/examples/blob/3.2.x/kafka-streams/src/test/java/io/confluent/examples/streams/EventDeduplicationLambdaIntegrationTest.java (this is for Confluent 3.2 / Kafka 0.10.2

Re: APPLICATION_SERVER_CONFIG ?

2017-03-27 Thread Michael Noll
kafka since I joined this company last year. I think my > largest issue is rethinking some preexisting notions about streaming to > make them work in the kstream universe. > > On Fri, Mar 24, 2017 at 6:07 AM, Michael Noll > wrote: > > > > If I understand thi

Re: Custom stream processor not triggering #punctuate()

2017-03-28 Thread Michael Noll
Elliot, in the current API, `punctuate()` is called based on the current stream-time (which defaults to event-time), not based on the current wall-clock time / processing-time. See http://docs.confluent.io/ current/streams/faq.html#why-is-punctuate-not-called. The stream-time is advanced only wh

Re: Understanding ReadOnlyWindowStore.fetch

2017-03-29 Thread Michael Noll
Jon, there's a related example, using a window store and a key-value store, at https://github.com/confluentinc/examples/blob/3.2.x/kafka-streams/src/test/java/io/confluent/examples/streams/ValidateStateWithInteractiveQueriesLambdaIntegrationTest.java (this is for Confluent 3.2 / Kafka 0.10.2). -M

Re: Using Kafka Stream in the cluster of kafka on one or multiple docker-machine/s

2017-03-30 Thread Michael Noll
main/java/io/confluent/examples/streams/ > > WordCountLambdaExample.java#L55-L62 and > > https://github.com/confluentinc/examples/tree/3.2.x/kafka- > > streams#packaging-and-running I missed the fact that the jar should be > > run in a separate container. > > > > Bes

Re: ThoughWorks Tech Radar: Assess Kafka Streams

2017-03-30 Thread Michael Noll
Aye! Thanks for sharing, Jan. :-) On Wed, Mar 29, 2017 at 8:56 PM, Eno Thereska wrote: > Thanks for the heads up Jan! > > Eno > > > On 29 Mar 2017, at 19:08, Jan Filipiak wrote: > > > > Regardless of how usefull you find the tech radar. > > > > Well deserved! even though we all here agree that

Re: weird SerializationException when consumer is fetching and parsing record in streams application

2017-03-30 Thread Michael Noll
Could this be a corrupted message ("poison pill") in your topic? If so, take a look at http://docs.confluent.io/current/streams/faq.html#handling-corrupted-records-and-deserialization-errors-poison-pill-messages FYI: We're currently investigating a more elegant way to address such poison pill pro

Re: Understanding ReadOnlyWindowStore.fetch

2017-03-30 Thread Michael Noll
ey().reduce(..., "somekeystore"); > >> > > >> > and then call this: > >> > > >> > kt.forEach()-> ... > >> > > >> > Can I assume that everything that comes out will be available in > >> > "

Re: weird SerializationException when consumer is fetching and parsing record in streams application

2017-03-30 Thread Michael Noll
V data = null; > > > > > try { > > > > > data = objectMapper.readValue(paramArrayOfByte, new > > > > > TypeReference() {}); > > > > > } catch (Exception e) { > > > > > e.printStackTra

Re: weird SerializationException when consumer is fetching and parsing record in streams application

2017-03-30 Thread Michael Noll
Hmm, I re-read the stacktrace again. It does look like the value-side being the culprit (as Sachin suggested earlier). -Michael On Thu, Mar 30, 2017 at 3:18 PM, Michael Noll wrote: > Sachin, > > you have this line: > > > builder.stream(Serdes.String(), serde, "advice

Re: weird SerializationException when consumer is fetching and parsing record in streams application

2017-03-30 Thread Michael Noll
Sachin, there's a JIRA that seems related to what you're seeing: https://issues.apache.org/jira/browse/KAFKA-4740 Perhaps you could check the above and report back? -Michael On Thu, Mar 30, 2017 at 3:23 PM, Michael Noll wrote: > Hmm, I re-read the stacktrace again. It does

Re: auto.offset.reset for Kafka streams 0.10.2.0

2017-04-11 Thread Michael Noll
It's also documented at http://docs.confluent.io/current/streams/developer-guide.html#non-streams-configuration-parameters . FYI: We have already begun syncing the Confluent docs for Streams into the Apache Kafka docs for Streams, but there's still quite some work left (volunteers are welcome :-P)

Re: Kafka Streams: a back-pressure question for windowed streams

2017-04-20 Thread Michael Noll
Hi there! In short, Kafka Streams ensures that your application consumes only as much data (or: as fast) as it can process it. The main "problem" you might encounter is not that you run into issues with state stores (like in-memory stores or RocksDB stores), but -- which is a more general issue -

Re: Joining on non-keyed values - how to lookup fields

2017-04-20 Thread Michael Noll
Jon, the recently introduced GlobalKTable ("global tables") allow you to perform non-key lookups. See http://docs.confluent.io/current/streams/developer-guide.html#kstream-globalktable-join (and the javadocs link) > So called "internal" values can't be looked up. If I understand you correctly:

Re: [ANNOUNCE] New committer: Rajini Sivaram

2017-04-24 Thread Michael Noll
Congratulations, Rajini! On Mon, Apr 24, 2017 at 11:50 PM, Ismael Juma wrote: > Congrats Rajini! Well-deserved. :) > > Ismael > > On Mon, Apr 24, 2017 at 10:06 PM, Gwen Shapira wrote: > > > The PMC for Apache Kafka has invited Rajini Sivaram as a committer and we > > are pleased to announce tha

Re: Kafka Streams 0.10.0.1 - multiple consumers not receiving messages

2017-04-28 Thread Michael Noll
To add to what Eno said: You can of course use the Kafka Streams API to build an application that consumes from multiple Kafka topics. But, going back to your original question, the scalability of Kafka and the Kafka Streams API is based on partitions, not on topics. -Michael On Fri, Apr 28,

Re: [DISCUSS]: KIP-161: streams record processing exception handlers

2017-05-30 Thread Michael Noll
Thanks for your work on this KIP, Eno -- much appreciated! - I think it would help to improve the KIP by adding an end-to-end code example that demonstrates, with the DSL and with the Processor API, how the user would write a simple application that would then be augmented with the proposed KIP ch

Re: Finding StreamsMetadata with value-dependent partitioning

2017-06-06 Thread Michael Noll
Happy to hear you found a working solution, Steven! -Michael On Sat, Jun 3, 2017 at 12:53 AM, Steven Schlansker < sschlans...@opentable.com> wrote: > > > > On Jun 2, 2017, at 3:32 PM, Matthias J. Sax > wrote: > > > > Thanks. That helps to understand the use case better. > > > > Rephrase to ma

<    1   2