Thanks for bringing this up, Matthias.
+1
On Wed, Feb 1, 2017 at 8:15 AM, Gwen Shapira wrote:
> +1
>
> On Tue, Jan 31, 2017 at 5:57 PM, Matthias J. Sax
> wrote:
> > Hi,
> >
> > I want to collect feedback about the idea to publish docs for current
> > trunk version of Apache Kafka.
> >
> > Curr
Many thanks for the KIP and the PR, Steven!
My opinion, too, is that we should consider including this.
One thing that I would like to see clarified is the difference between the
proposed peek() and existing functions map() and foreach(), for instance.
My understanding (see also the Java 8 links
Adam,
also a FYI: The upcoming 0.10.2 version of the Streams API will be
backwards compatible with 0.10.1 clusters, so you can keep your brokers on
0.10.1.1 and still use the latest Streams API version (including the one
from trunk, as Matthias mentioned).
-Michael
On Mon, Feb 13, 2017 at 1:04
> By the way - do I understand correctly that when a state store is
persistent, it is logged by default?
Yes.
> So enableLogging(Map) only is a way to provide default configuration to
the default logging?
Yes. That is, any configs that should be applied to the state store's
changelog topic.
>
Dmitry,
I think your use case is similar to the one I described in the link below
(discussion in the kafka-dev mailing list):
http://search-hadoop.com/m/uyzND1rVOQ12OJ84U&subj=Re+Streams+TTLCacheStore
Could you take a quick look?
-Michael
On Wed, Feb 22, 2017 at 12:39 AM, Dmitry Minkovsky
w
Pete,
have you looked at Kafka's Streams API yet?
There are many examples available in the `kafka-streams` folder at
https://github.com/confluentinc/examples. The simplest example of "Do sth
to a new data record as soon as it arrives" might be
the MapFunctionLambdaExample. You can create differ
> Also, is it possible to stop the syncing between state stores to brokers,
if I am fine with failures?
Yes, you can disable the syncing (or the "changelog" feature) of state
stores:
http://docs.confluent.io/current/streams/developer-guide.html#enable-disable-state-store-changelogs
> I do have a
size 1 hour and retention of 3 hours.
>
> So to conclude if you can manage rocks db, then kafka streams is good to
> start with, its simple and very intuitive to use.
>
> Again on rocksdb side, is there a way to eliminate that and is
>
> disableLogging
>
> for that?
>
&g
Good point, Steven. +1 here.
On Wed, Mar 1, 2017 at 8:52 AM, Damian Guy wrote:
> +1
> On Wed, 1 Mar 2017 at 07:15, Guozhang Wang wrote:
>
> > Hey Steven,
> >
> > That is a good question, and I think your proposal makes sense. Could you
> > file a JIRA for this change to keep track of it?
> >
>
I'd use option 2 (Kafka Connect).
Advantages of #2:
- The code is decoupled from the processing code and easier to refactor in
the future. (same as #4)
- The runtime/uptime/scalability of your Kafka Streams app (processing) is
decoupled from the runtime/uptime/scalability of the data ingestion in
The DSL has some unique features that aren't in the Processor API, such as:
- KStream and KTable abstractions.
- Support for time windows (tumbling windows, hopping windows) and session
windows. The Processor API only has stream-time based `punctuate()`.
- Record caching, which is slightly better
There's also an end-to-end example for DSL and Processor API integration:
https://github.com/confluentinc/examples/blob/3.2.x/kafka-streams/src/test/java/io/confluent/examples/streams/MixAndMatchLambdaIntegrationTest.java
Best,
Michael
On Tue, Mar 7, 2017 at 4:51 PM, LongTian Wang wrote:
> Re
Thanks for the update, Matthias.
+1 to the points 1,2,3,4 you mentioned.
Naming is always a tricky subject, but renaming KStreamBuilder
to StreamsTopologyBuilder looks ok to me (I would have had a slight
preference towards DslTopologyBuilder, but hey.) The most important aspect
is, IMHO, what yo
There's actually a demo application that demonstrates the simplest use case
for Kafka's Streams API: to read data from an input topic and then write
that data as-is to an output topic.
https://github.com/confluentinc/examples/blob/3.2.x/kafka-streams/src/test/java/io/confluent/examples/streams/Pa
I think a related JIRA ticket is
https://issues.apache.org/jira/browse/KAFKA-4829 (see Guozhang's comment
about the ticket's scope).
-Michael
On Thu, Mar 9, 2017 at 6:22 PM, Damian Guy wrote:
> Hi Nicolas,
>
> Please do file a JIRA.
>
> Many thanks,
> Damian
>
> On Thu, 9 Mar 2017 at 15:54 Nic
In addition to what Eno already mentioned here's some quick feedback:
- Only for reference, I'd add that 20GB of state is not necessarily
"massive" in absolute terms. I have talked to users whose apps manage much
more state than that (1-2 orders of magnitude more). Whether or not 20 GB
is massiv
application's processing needs.
-Michael
On Tue, Mar 14, 2017 at 9:00 AM, BYEONG-GI KIM wrote:
> Dear Michael Noll,
>
> I have a question; Is it possible converting JSON format to YAML format via
> using Kafka Streams?
>
> Best Regards
>
> KIM
>
> 2017-03-10 11:36 GMT+09
I see Jay's point, and I agree with much of it -- notably about being
careful which concepts we do and do not expose, depending on which user
group / user type is affected. That said, I'm not sure yet whether or not
we should get rid of "Topology" (or a similar term) in the DSL.
For what it's wor
Mina,
in your original question you wrote:
> However, I do not see the word count when I try to run below example.
Looks like that it does not connect to Kafka.
The WordCount demo example writes its output to Kafka only -- it *does
not* write any results to the console/STDOUT.
>From what I can
ob/3.2.x/kafka-streams/src/main/
> java/io/confluent/examples/streams/WordCountLambdaExample.java#L178-L181)
> in my IDE was not and still is not working.
>
> Best regards,
> Mina
>
>
> On Wed, Mar 15, 2017 at 4:43 AM, Michael Noll
> wrote:
>
> > Mina,
> >
> &g
Hi Armaan,
> org.apache.spark.SparkException: Job aborted due to stage failure:
>Task 0.0 in stage 0.0 (TID 0) had a not serializable result:
org.apache.kafka.clients.consumer.ConsumerRecord
perhaps you should ask that question in the Spark mailing list, which
should increase your chances of
And since you asked for a pointer, Ali:
http://docs.confluent.io/current/streams/concepts.html#windowing
On Mon, Mar 20, 2017 at 5:43 PM, Michael Noll wrote:
> Late-arriving and out-of-order data is only treated specially for windowed
> aggregations.
>
> For stateless operat
Late-arriving and out-of-order data is only treated specially for windowed
aggregations.
For stateless operations such as `KStream#foreach()` or `KStream#map()`,
records are processed in the order they arrive (per partition).
-Michael
On Sat, Mar 18, 2017 at 10:47 PM, Ali Akhtar wrote:
> >
Jon,
the windowing operation of Kafka's Streams API (in its DSL) aligns
time-based windows to the epoch [1]:
Quoting from e.g. hopping windows (sometimes called sliding windows in
other technologies):
> Hopping time windows are aligned to the epoch, with the lower interval
bound
> being inclusiv
t possible to get metadata on the message, such as whether or not
> its late, its index/position within the other messages, etc?
>
> On Mon, Mar 20, 2017 at 9:44 PM, Michael Noll
> wrote:
>
> > And since you asked for a pointer, Ali:
> > http://docs.confluent.io/current/st
ucket) should be started, and future messages should belong to that
> 'session', until the next 30+ min gap).
>
> On Mon, Mar 20, 2017 at 11:44 PM, Michael Noll
> wrote:
>
> > > Can windows only be used for aggregations, or can they also be used for
> > fore
d the topology if provided as a constructor argument. However,
> >> especially for DSL (not sure if it would make sense for PAPI), the DSL
> >> builder could create the client for the user.
> >>
> >> Something like this:
> >>
> >>> KStreamBuilder bu
Typically you'd containerize your app and then launch e.g. 10 containers if
you need to run 10 instances of your app.
Also, what do you mean by "in a cluster of Kafka containers" and "in the
cluster of Kafkas"?
On Tue, Mar 21, 2017 at 9:08 PM, Mina Aslani wrote:
> Hi,
>
> I am trying to underst
Forwarding to kafka-user.
-- Forwarded message --
From: Michael Noll
Date: Wed, Mar 22, 2017 at 8:48 AM
Subject: Re: [DISCUSS] KIP-120: Cleanup Kafka Streams builder API
To: d...@kafka.apache.org
Matthias,
> @Michael:
>
> You seemed to agree with Jay about not exp
IIRC the PageViewTypedDemo example requires input data where the
username/userId is captured in the keys of messages/records, and further
information in the values of those messages.
The problem you are running into is that, when you are writing your input
data via the console consumer, the record
To add to what Matthias said, in case the following isn't clear:
- You should not (and, in 0.10.2, cannot any longer) call the iterator's
remove() method, i.e. `KeyValueIterator#remove()` when iterating through a
`KeyValueStore`. Perhaps this is something we should add to the
`KeyValueIterator` j
mThread.runLoop(
> > StreamThread.java:415)
> > at org.apache.kafka.streams.processor.internals.
> > StreamThread.run(StreamThread.java:242)
> >
> > Any example on the correct input value is really appreciated.
> >
> > Thanks
> >
> > On Wed, Mar 2
Jon,
you can use Kafka's interactive queries feature for this:
http://docs.confluent.io/current/streams/developer-guide.html#interactive-queries
-Michael
On Thu, Mar 23, 2017 at 1:52 PM, Jon Yeargers
wrote:
> If I have an aggregation :
>
> KTable, VideoLogLine> outTable =
> sourceStream.grou
his sort of 'aggregate and clear' approach still requires an
> external datastore (like Redis). Please correct me if Im wrong.
>
> On Mon, Mar 20, 2017 at 9:29 AM, Michael Noll
> wrote:
>
> > Jon,
> >
> > the windowing operation of Kafka's Streams API
> If I understand this correctly: assuming I have a simple aggregator
> distributed across n-docker instances each instance will _also_ need to
> support some sort of communications process for allowing access to its
> statestore (last param from KStream.groupby.aggregate).
Yes.
See
http://docs.c
We're talking about `ulimit` (CLI tool) and the `nofile` limit (number of
open files), which you can access via `ulimit -n`.
Examples:
https://access.redhat.com/solutions/61334
https://stackoverflow.com/questions/21515463/how-to-increase-maximum-file-open-limit-ulimit-in-ubuntu
Depending on the o
IIRC this may happen, for example, if the first host runs all the stream
tasks (here: 2 in total) and migration of stream task(s) to the second host
hasn't happened yet.
-Michael
On Sun, Mar 26, 2017 at 3:14 PM, Jon Yeargers
wrote:
> Also - if I run this on two hosts - what does it imply if t
Jon,
Damian already answered your direct question, so my comment is a FYI:
There's a demo example at
https://github.com/confluentinc/examples/blob/3.2.x/kafka-streams/src/test/java/io/confluent/examples/streams/EventDeduplicationLambdaIntegrationTest.java
(this is for Confluent 3.2 / Kafka 0.10.2
kafka since I joined this company last year. I think my
> largest issue is rethinking some preexisting notions about streaming to
> make them work in the kstream universe.
>
> On Fri, Mar 24, 2017 at 6:07 AM, Michael Noll
> wrote:
>
> > > If I understand thi
Elliot,
in the current API, `punctuate()` is called based on the current
stream-time (which defaults to event-time), not based on the current
wall-clock time / processing-time. See http://docs.confluent.io/
current/streams/faq.html#why-is-punctuate-not-called. The stream-time is
advanced only wh
Jon,
there's a related example, using a window store and a key-value store, at
https://github.com/confluentinc/examples/blob/3.2.x/kafka-streams/src/test/java/io/confluent/examples/streams/ValidateStateWithInteractiveQueriesLambdaIntegrationTest.java
(this is for Confluent 3.2 / Kafka 0.10.2).
-M
main/java/io/confluent/examples/streams/
> > WordCountLambdaExample.java#L55-L62 and
> > https://github.com/confluentinc/examples/tree/3.2.x/kafka-
> > streams#packaging-and-running I missed the fact that the jar should be
> > run in a separate container.
> >
> > Bes
Aye! Thanks for sharing, Jan. :-)
On Wed, Mar 29, 2017 at 8:56 PM, Eno Thereska
wrote:
> Thanks for the heads up Jan!
>
> Eno
>
> > On 29 Mar 2017, at 19:08, Jan Filipiak wrote:
> >
> > Regardless of how usefull you find the tech radar.
> >
> > Well deserved! even though we all here agree that
Could this be a corrupted message ("poison pill") in your topic?
If so, take a look at
http://docs.confluent.io/current/streams/faq.html#handling-corrupted-records-and-deserialization-errors-poison-pill-messages
FYI: We're currently investigating a more elegant way to address such
poison pill pro
ey().reduce(..., "somekeystore");
> >> >
> >> > and then call this:
> >> >
> >> > kt.forEach()-> ...
> >> >
> >> > Can I assume that everything that comes out will be available in
> >> > "
V data = null;
> > > > > try {
> > > > > data = objectMapper.readValue(paramArrayOfByte, new
> > > > > TypeReference() {});
> > > > > } catch (Exception e) {
> > > > > e.printStackTra
Hmm, I re-read the stacktrace again. It does look like the value-side being
the culprit (as Sachin suggested earlier).
-Michael
On Thu, Mar 30, 2017 at 3:18 PM, Michael Noll wrote:
> Sachin,
>
> you have this line:
>
> > builder.stream(Serdes.String(), serde, "advice
Sachin,
there's a JIRA that seems related to what you're seeing:
https://issues.apache.org/jira/browse/KAFKA-4740
Perhaps you could check the above and report back?
-Michael
On Thu, Mar 30, 2017 at 3:23 PM, Michael Noll wrote:
> Hmm, I re-read the stacktrace again. It does
It's also documented at
http://docs.confluent.io/current/streams/developer-guide.html#non-streams-configuration-parameters
.
FYI: We have already begun syncing the Confluent docs for Streams into the
Apache Kafka docs for Streams, but there's still quite some work left
(volunteers are welcome :-P)
Hi there!
In short, Kafka Streams ensures that your application consumes only as much
data (or: as fast) as it can process it.
The main "problem" you might encounter is not that you run into issues with
state stores (like in-memory stores or RocksDB stores), but -- which is a
more general issue -
Jon,
the recently introduced GlobalKTable ("global tables") allow you to perform
non-key lookups.
See
http://docs.confluent.io/current/streams/developer-guide.html#kstream-globalktable-join
(and the javadocs link)
> So called "internal" values can't be looked up.
If I understand you correctly:
Congratulations, Rajini!
On Mon, Apr 24, 2017 at 11:50 PM, Ismael Juma wrote:
> Congrats Rajini! Well-deserved. :)
>
> Ismael
>
> On Mon, Apr 24, 2017 at 10:06 PM, Gwen Shapira wrote:
>
> > The PMC for Apache Kafka has invited Rajini Sivaram as a committer and we
> > are pleased to announce tha
To add to what Eno said:
You can of course use the Kafka Streams API to build an application that
consumes from multiple Kafka topics. But, going back to your original
question, the scalability of Kafka and the Kafka Streams API is based on
partitions, not on topics.
-Michael
On Fri, Apr 28,
Thanks for your work on this KIP, Eno -- much appreciated!
- I think it would help to improve the KIP by adding an end-to-end code
example that demonstrates, with the DSL and with the Processor API, how the
user would write a simple application that would then be augmented with the
proposed KIP ch
Happy to hear you found a working solution, Steven!
-Michael
On Sat, Jun 3, 2017 at 12:53 AM, Steven Schlansker <
sschlans...@opentable.com> wrote:
> >
> > On Jun 2, 2017, at 3:32 PM, Matthias J. Sax
> wrote:
> >
> > Thanks. That helps to understand the use case better.
> >
> > Rephrase to ma
101 - 155 of 155 matches
Mail list logo