Folks,
I would like to revive this thread on KIP-28: I have just updated the patch
rebased on latest trunk incorporating the feedbacks collected so far:
https://github.com/apache/kafka/pull/130
And the wiki page for this KIP has also been updated with the API and
architectural designs:
https://
Jiangjie,
Thanks for the explanation, now I understands the scenario. It is one of
the CEP in stream processing, in which I think the local state should be
used for some sort of pattern matching. More concretely, let's say in this
case we have a local state storing what have been observed. Then th
Guozhang,
By interleaved groups of message, I meant something like this: Say we have
message 0,1,2,3, message 0 and 2 together completes a business logic,
message 1 and 3 together completes a business logic. In that case, after
user processed message 2, they cannot commit offsets because if they c
Hello folks,
I have updated the KIP page with some detailed API / architecture /
packaging proposals, along with the long promised first patch in PR:
https://cwiki.apache.org/confluence/display/KAFKA/KIP-28+-+Add+a+processor+client
https://github.com/apache/kafka/pull/130
Any feedbacks / comme
Hi Jun,
1. I have removed the streamTime in punctuate() since it is not only
triggered by clock time, detailed explanation can be found here:
https://cwiki.apache.org/confluence/display/KAFKA/KIP-28+-+Add+a+processor+client#KIP-28-Addaprocessorclient-StreamTime
2. Yes, if users do not schedule a
Hi Jiangjie,
Not sure I understand the "What If user have interleaved groups of messages,
each group makes a complete logic?" Could you elaborate a bit?
About the committing functionality, it currently will only commit up to the
processed message's offset; the commit() call it self actually does
A few questions/comments.
1. What's streamTime passed to punctuate()? Is that just the current time?
2. Is punctuate() only called if schedule() is called?
3. The way the KeyValueStore is created seems a bit weird. Since this is
part of the internal state managed by KafkaProcessorContext, it seems
I think the abstraction of processor would be useful. It is not quite clear
to me yet though which grid in the following API analysis chart this
processor is trying to satisfy.
https://cwiki.apache.org/confluence/display/KAFKA/New+consumer+API+change+proposal
For example, in current proposal. It
Just a quick ping, that regardless of the name of the thing, I'm still
interested in answers to my questions :)
On Tue, Jul 28, 2015 at 3:07 PM, Gwen Shapira wrote:
> Thanks Guazhang! Much clearer now, at least for me.
>
> Few comments / questions:
>
> 1. Perhaps punctuate(int numRecords) will
I agree with Sriram and Martin. Kafka is already about providing streams of
data, and so Kafka Streams or anything like that is confusing to me.
This new library is about making it easier to process the data.
-James
On Jul 30, 2015, at 9:38 AM, Aditya Auradkar
wrote:
> Personally, I prefer K
Personally, I prefer KafkaStreams just because it sounds nicer. For the
reasons identified above, KafkaProcessor or KProcessor is more apt but
sounds less catchy (IMO). I also think we should prefix with Kafka (rather
than K) because we will then have 3 clients: KafkaProducer, KafkaConsumer
and Kaf
I think its also a matter of intent. If we see it as "yet another
client library", than Processor (to match Producer and Consumer) will
work great.
If we see it is a stream processing framework, the name has to start
with S to follow existing convention.
Speaking of naming conventions:
You know ho
I had the same thought. Kafka processor, KProcessor or even Kafka
stream processor is more relevant.
> On Jul 30, 2015, at 2:09 PM, Martin Kleppmann wrote:
>
> I'm with Sriram -- Kafka is all about streams already (or topics, to be
> precise, but we're calling it "stream processing" not "topic
I'm with Sriram -- Kafka is all about streams already (or topics, to be
precise, but we're calling it "stream processing" not "topic processing"), so I
find "Kafka Streams", "KStream" and "Kafka Streaming" all confusing, since they
seem to imply that other bits of Kafka are not about streams.
I
I would vote for KStream as it sounds sexier (is it only me??), second to
that would be Kafka Streaming.
On Wed, Jul 29, 2015 at 6:08 PM, Jay Kreps wrote:
> Also, the most important part of any prototype, we should have a name for
> this producing-consumer-thingamgigy:
>
> Various ideas:
> - Kaf
Since it sounds like it is not a separate framework (like CopyCat) but
rather a new client, it will be nice to follow existing convention.
Producer, Consumer and Processor (or Transformer) make sense to me.
Note that the way the API is currently described, people may want to
use it inside Spark ap
I think Kafka and streaming are synonymous. Kafka streams or Kafka
streaming really does not indicate "stream processing".
On Wed, Jul 29, 2015 at 6:20 PM, Neha Narkhede wrote:
> Prefer something that evokes "stream processing on top of Kafka". And since
> I've heard many people conflate "stream
Prefer something that evokes "stream processing on top of Kafka". And since
I've heard many people conflate "streaming" with "streaming video" (I know,
duh!), I'd vote for Kafka Streams or a maybe KStream.
Thanks,
Neha
On Wed, Jul 29, 2015 at 6:08 PM, Jay Kreps wrote:
> Also, the most important
Also, the most important part of any prototype, we should have a name for
this producing-consumer-thingamgigy:
Various ideas:
- Kafka Streams
- KStream
- Kafka Streaming
- The Processor API
- Metamorphosis
- Transformer API
- Verwandlung
For my part I think what people are trying to do is stream
The second question that came up on the KIP was how do joins and
aggregations work. A lot of implicit thinking went into Kafka's data model
to support stream processing so there is an idea of how this should work
but it isn't exactly obvious. Let me go through the idea of how a processor
is meant t
Thanks Guazhang! Much clearer now, at least for me.
Few comments / questions:
1. Perhaps punctuate(int numRecords) will be a nice API addition, some
use-cases have record-count based windows, rather than time-based..
2. The diagram for "Flexible partition distribution" shows two joins.
Is the ide
Here is the link to the original prototype we started with. I wouldn't
focus to heavily on the details of this code or the api, but I think it
gives the an idea of the lowest level api, amount of code, etc. It was
basically a clone of Samza built on Kafka using the new consumer protocol
just to exp
I have updated the wiki page incorporating people's comments, please feel
free to take another look before today's meeting.
On Mon, Jul 27, 2015 at 11:19 PM, Yi Pan wrote:
> Hi, Jay,
>
> {quote}
> 1. Yeah we are going to try to generalize the partition management stuff.
> We'll get a wiki/JIRA u
Hi, Jay,
{quote}
1. Yeah we are going to try to generalize the partition management stuff.
We'll get a wiki/JIRA up for that. I think that gives what you want in
terms of moving partitioning to the client side.
{quote}
Great! I am looking forward to that.
{quote}
I think the key observation is th
Hi Adi,
Just to clarify, the cmdline tool would be used, as stated in the wiki
page, to run the client library "as a process", which is still far away
from a "service". It is just like what we have for kafka-console-producer,
kafka-console-consumer, kafka-mirror-maker, etc today.
Guozhang
On Mon
Hi, Neha,
{quote}
We do hope to include a DSL since that is the most natural way of
expressing stream processing operations on top of the processor client. The
DSL layer should be equivalent to that provided by Spark streaming or Flink
in terms of expressiveness though there will be differences in
Adi,
How far away are we from having something a prototype patch to play with?
>
We are working to share a prototype next week. Though the code will evolve
to match the APIs and design as it shapes up, but it will be great if
people can take a look and provide feedback.
Couple of observations:
>
Hi, Aditya,
{quote}
- The KIP states that cmd line tools will be provided to deploy as a
separate service. Is the proposed scope limited to providing a library with
which makes it possible build stream-processing-as- a-service or provide
such a service within Kafka itself?
{quote}
There has alrea
Gwen,
We have a compilation of notes from comparison with other systems. They
might be missing details that folks who worked on that system might be able
to point out. We can share that and discuss further on the KIP call.
We do hope to include a DSL since that is the most natural way of
expressi
+1 on comparison with existing solutions. On a high level, it seems nice to
have a transform library inside Kafka.. a lot of the building blocks are
already there to build a stream processing framework. However the details
are tricky to get right I think this discussion will get a lot more
interest
Hi,
Since we will be discussing KIP-28 in the call tomorrow, can you
update the KIP with the feature-comparison with existing solutions?
I admit that I do not see a need for single-event-producer-consumer
pair (AKA Flume Interceptor). I've seen tons of people implement such
apps in the last year,
Hey Yi,
Great points. I think for some of this the most useful thing would be to
get a wip prototype out that we could discuss concretely. I think Yasuhiro
and Guozhang took that prototype I had done, and had some improvements.
Give us a bit to get that into understandable shape so we can discuss.
Hi, Jay and all,
Thanks for all your quick responses. I tried to summarize my thoughts here:
- ConsumerRecord as stream processor API:
* This KafkaProcessor API is targeted to receive the message from Kafka.
So, to Yasuhiro's join/transformation example, any join/transformation
results that a
On 24 Jul 2015 18:03, "Jay Kreps" wrote:
> Does this make sense to people? If so let's try it and if we like it
better
> we can formally make that the process for this kind of big thing.
Yes, sounds good to me.
Best,
Ismael
@guozhang re: library vs framework - Nothing critical missing. But I think
KIPs will serve as a sort of changelog since ones like this
https://archive.apache.org/dist/kafka/0.8.2.0/RELEASE_NOTES.html are not
all that helpful to end users. A KIP like this is a major new addition, so
laying out all t
Hey Yi,
For your other two points:
- This definitely doesn't cover any kind of SQL or anything like this.
- The prototype we started with just had process() as a method but Yasuhiro
had some ideas of adding additional filter/aggregate convenience methods.
We should discuss how this would fit wit
Jay, I understand that. Context can provide more information without
breaking the compatibility if needed. Also I am not sure ConsumerRecord is
the right abstraction of data for stream processing. After transformation
or join, what is the topic and the offset? It is odd to use ConsumerRecord.
We ca
Agree that the normal KIP process is awkward for larger changes like this.
I'm a +1 on trying out this new process for the processor client, see how
it works out and then make that a process for future large changes of this
nature.
On Fri, Jul 24, 2015 at 10:03 AM, Jay Kreps wrote:
> I agree tha
Hi Yi,
Inlined.
On Fri, Jul 24, 2015 at 12:57 AM, Yi Pan wrote:
> Hi, Guozhang,
>
> Thanks for starting this. I took a quick look and had the following
> thoughts to share:
>
> - In the proposed KafkaProcessor API, there is no interface like Collector
> that allows users to send messages to. Wh
The goal of this KIP is to provide a lightweight/embeddable streaming
framework, and allows Kafka users to start using stream processing easily. DSL
is not covered in this KIP. But, DSL is a very attractive option to have.
> In the proposed KafkaProcessor API, there is no interface like Collector
To follow on to one of Yi's points about taking ConsumerRecord vs
topic/key/value. One thing we have found is that for user-facing APIs
considering future API evolution is really important. If you do
topic/key/value and then realize you need offset added you end up having to
break everyones code. T
I agree that the KIP process doesn't fit well for big areas of development
like the new consumer, copycat, or this.
I think the approach for copycat where we do a "should this exist" KIP vote
followed by a review on code checkin isn't ideal because of course the
question of "should we do it" is di
Hi Jiangjie,
Inlined.
On Thu, Jul 23, 2015 at 11:32 PM, Jiangjie Qin
wrote:
> Hey Guozhang,
>
> I just took a quick look at the KIP, is it very similar to mirror maker
> with message handler?
>
>
I think the processor client would supporting a superset of functionalities
than MM with message ha
Hi Ewen,
Replies inlined.
On Thu, Jul 23, 2015 at 10:25 PM, Ewen Cheslack-Postava
wrote:
> Just some notes on the KIP doc itself:
>
> * It'd be useful to clarify at what point the plain consumer + custom code
> + producer breaks down. I think trivial filtering and aggregation on a
> single stre
Hi, Guozhang,
Thanks for starting this. I took a quick look and had the following
thoughts to share:
- In the proposed KafkaProcessor API, there is no interface like Collector
that allows users to send messages to. Why is that? Is the idea to
initialize the producer once and re-use it in the proc
Ewen:
* I think trivial filtering and aggregation on a single stream usually work
> fine with this model.
The way I see this, the process() API is an abstraction for
message-at-a-time computations. In the future, you could imagine providing
a simple DSL layer on top of the process() API that pro
Hey Guozhang,
I just took a quick look at the KIP, is it very similar to mirror maker
with message handler?
Thanks,
Jiangjie (Becket) Qin
On Thu, Jul 23, 2015 at 10:25 PM, Ewen Cheslack-Postava
wrote:
> Just some notes on the KIP doc itself:
>
> * It'd be useful to clarify at what point the p
Just some notes on the KIP doc itself:
* It'd be useful to clarify at what point the plain consumer + custom code
+ producer breaks down. I think trivial filtering and aggregation on a
single stream usually work fine with this model. Anything where you need
more complex joins, windowing, etc. are
48 matches
Mail list logo