Moved this KIP into status "inactive". Feel free to resume and any time.
-Matthias
On 7/15/18 6:55 PM, Matthias J. Sax wrote:
> I think it would make a lot of sense to provide a simple DSL abstraction.
>
> Something like:
>
> KStream stream = ...
> KTable count = stream.count();
>
> The missin
I think it would make a lot of sense to provide a simple DSL abstraction.
Something like:
KStream stream = ...
KTable count = stream.count();
The missing groupBy() or grouByKey() class indicates a global counting
operation. The JavaDocs should highlight the impact.
One open question is, what ke
That's a lot of email exchanges for me to catch up :)
My original proposed alternative solution is indeed relying on
pre-aggregate before sending to the single-partition topic, so that the
traffic on that single-partition topic would not be huge (I called it
partial-aggregate but the intent was th
Ok, I didn't get quite as far as I hoped, and several things are far from
ready, but here's what I have so far:
https://github.com/apache/kafka/pull/5337
The "unit" test works, and is a good example of how you should expect it to
behave:
https://github.com/apache/kafka/pull/5337/files#diff-2fdec52
Hey Flávio,
Thanks! I haven't got anything usable yet, but I'm working on it now. I'm
hoping to push up my branch by the end of the day.
I don't know if you've seen it but Streams actually already has something
like this, in the form of caching on materialized stores. If you pass in a
"Materializ
John, that was fantastic, man!
Have you built any custom implementation of your KIP in your machine so that I
could test it out here? I wish I could test it out.
If you need any help implementing this feature, please tell me.
Thanks.
-Flávio Stutz
On 2018/07/03 18:04:52, John Roesler wrote:
Hi Flávio,
Thanks! I think that we can actually do this, but the API could be better.
I've included Java code below, but I'll copy and modify your example so
we're on the same page.
EXERCISE 1:
- The case is "total counting of events for a huge website"
- Tasks from Application A will have som
Great feature you have there!
I'll try to exercise here how we would achieve the same functional objectives
using your KIP:
EXERCISE 1:
- The case is "total counting of events for a huge website"
- Tasks from Application A will have something like:
.stream(/site-events)
.co
Hi Flávio,
Sure thing. And apologies in advance if I missed the point.
Below is some more-or-less realistic Java code to demonstrate how, given a
high-volume (heavily partitioned) stream of purchases, we can "step down"
the update rate with rate-limited intermediate aggregations.
Please bear in m
Thanks for clarifying the real usage of KIP-328. Now I understood a bit better.
I didn't see how that feature would be used to minimize the number of
publications to the single partitioned output topic. When it is falls into
supression, the graph stops going down? Could you explain better? If tha
Hi Flávio,
Thanks for the KIP. I'll apologize that I'm arriving late to the
discussion. I've tried to catch up, but I might have missed some nuances.
Regarding KIP-328, the idea is to add the ability to suppress intermediate
results from all KTables, not just windowed ones. I think this could
sup
For what I understood, that KIP is related to how KStreams will handle KTable
updates in Windowed scenarios to optimize resource usage.
I couldn't see any specific relation to this KIP. Had you?
-Flávio Stutz
On 2018/06/29 18:14:46, "Matthias J. Sax" wrote:
> Flavio,
>
> thanks for cleaning
> I agree with Guozhang on comparing the pros and cons of the approach he
> outlined vs the one in the proposed KIP.
I've just replied him. Please take a look.
> Will the triggering mechanism always be time, or would it make sense to
> expand to use other mechanisms such as the number of records,
Cons:
We tried the "single partition" strategy, but the problem is that for each
incoming message to the Graph, we have another output message with the
aggregated (cummulative or not) result, so that if we have a million messages/s
(among all parallel tasks) being processed, we'll have another m
Repasting my comment from the other email thread:
--
Flávio, thanks for creating this KIP.
I think this "single-aggregation" use case is common enough that we should
consider how to efficiently supports it: for example, for KSQL that's built
on top of Streams, we've seen
Hi Flávio,
Thanks for creating the KIP.
I agree with Guozhang on comparing the pros and cons of the approach he
outlined vs the one in the proposed KIP.
I also have a few clarification questions on the current KIP
Will the triggering mechanism always be time, or would it make sense to
expand to
Flavio,
thanks for cleaning up the KIP number collision.
With regard to KIP-328
(https://cwiki.apache.org/confluence/display/KAFKA/KIP-328%3A+Ability+to+suppress+updates+for+KTables)
I am wondering how both relate to each other?
Any thoughts?
-Matthias
On 6/29/18 10:23 AM, flaviost...@gmail.c
Just copying a follow up from another thread to here (sorry about the mess):
From: Guozhang Wang
Subject: Re: [DISCUSS] KIP-323: Schedulable KTable as Graph source
Date: 2018/06/25 22:24:17
List: dev@kafka.apache.org
Flávio, thanks for creating this KIP.
I think this "single-aggregation" use ca
Actually this was intended to be registered under the number 323, but someone
else catch this number while the proposal was being edited. The correct number
and URL of the KIP is:
https://cwiki.apache.org/confluence/display/KAFKA/KIP-326%3A+Schedulable+KTable+as+Graph+source
On 2018/06/26 20:22
What's the relationship between this KIP and KIP-323 ?
Thanks
On Tue, Jun 26, 2018 at 11:22 AM, Flávio Stutz
wrote:
> Hey, guys, I've just created a new KIP about creating a new DSL graph
> source for realtime partitioned consolidations.
>
> We have faced the following scenario/problem in a lot
Hey, guys, I've just created a new KIP about creating a new DSL graph
source for realtime partitioned consolidations.
We have faced the following scenario/problem in a lot of situations with
KStreams:
- Huge incoming data being processed by numerous application instances
- Need to aggregate
21 matches
Mail list logo