Arvind,

your use cases sound very similar to what Bobby Calderwell recently
described and presented at StrangeLoop this year:

Commander: Better Distributed Applications through CQRS, Event Sourcing,
and Immutable Logs
https://speakerdeck.com/bobbycalderwood/commander-better-distributed-applications-through-cqrs-event-sourcing-and-immutable-logs
(The link also includes pointers to the recorded talk and example code.)

See also
http://www.confluent.io/blog/event-sourcing-cqrs-stream-processing-apache-kafka-whats-connection/
for a higher-level introduction into such architectures.

Hope this helps!
Michael





On Mon, Sep 26, 2016 at 11:02 AM, Michael Noll <mich...@confluent.io> wrote:

> PS: Arvind, I think typically such questions should be sent to the
> kafka-user mailing list (not to kafka-dev).
>
> On Mon, Sep 26, 2016 at 3:41 AM, Matthias J. Sax <matth...@confluent.io>
> wrote:
>
>> Hi Arvind,
>>
>> short answer, Kafka Streams does definitely help you!
>>
>> Long answer, Kafka Streams offers two layer to program your stream
>> processing job. The low-level Processor API and the high level DSL.
>> Please check the documentation to get further details:
>> http://docs.confluent.io/3.0.1/streams/index.html
>>
>> With Processor API you are able to do anything -- on the cost of lower
>> abstraction and thus more coding. I guess, this would be there best way
>> for your use case to program your Kafka Streams application.
>>
>> The DSL is easier to use and provides high level abstractions --
>> however, I am not sure if it covers what you need in your use case. But
>> maybe it's worth to try it out before using Processor API...
>>
>> For your second question, I would recommend to use Processor API an
>> attach a state store to a processor node and write to your data
>> warehouse whenever a state is "complete" (see
>> http://docs.confluent.io/3.0.1/streams/developer-guide.html#
>> defining-a-state-store).
>>
>> One more hint: you can actually mix DSL and Processor API by using (eg.
>> process() or transform() within DSL).
>>
>>
>> Hope this gives you some initial pointers. Please follow up if you have
>> more questions.
>>
>> -Matthias
>>
>>
>> On 09/24/2016 11:25 PM, Arvind Kalyan wrote:
>> > Hello there!
>> >
>> > I read about Kafka Streams recently, pretty interesting the way it
>> solves
>> > the stream processing problem in a more cleaner way with less overheads
>> and
>> > complexities.
>> >
>> > I work as a Software Engineer in a startup, and we are in the design
>> stage
>> > for building a stream processing pipeline (if you will) for the
>> millions of
>> > events we get every day. We use Kafka (cluster) as the log aggregation
>> > layer already in production a 5-6 months back and very happy about it.
>> >
>> > I went through a few confluent blogs (by Jay, Neha) as to how KStreams
>> > solve for sort of a state-ful event processing, and maybe I missed the
>> > whole point in this regard, I have some doubts.
>> >
>> > We have use-cases like the following:
>> >
>> > There is an event E1, which is sort-of the base event after which we
>> have a
>> > lot of sub- events E2,E3..En enriching E1 with lot of extra properties
>> > (with considerable delay, say 30-40 mins).
>> >
>> > Eg. 1: An order event has come in where the user has ordered an item on
>> our
>> > website (This is the base event). After say 30-40 minutes, we get events
>> > like packaging_time, shipping_time, delivered_time or cancelled_time etc
>> > related to that order (These are the sub-events).
>> >
>> > So before we get the whole event to a warehouse, we need to collect all
>> > these (ordered, packaged, shipped, cancelled/delivered), and whenever I
>> get
>> > a cancelled or delivered event for an order, I know that completes the
>> > lifecycle for that order, and can put it in the warehouse.
>> >
>> > Eg. 2: User login events - If we are to capture events like
>> User-Logged-In,
>> > User-Logged-Out, I need it to be in the warehouse as a single row. Eg.
>> row
>> > would have these columns *user_id, login_time, logout_time*. So as and
>> when
>> > I receive a logout event (and if I have login event stored in some
>> store),
>> > there would be a trigger which combines both, and send it across to the
>> > warehouse.
>> >
>> > All these involve storing the state of the events and act as-and-when
>> > another event (that completes lifecycle) occurs, after which you have a
>> > trigger for further steps (warehouse or anything else).
>> >
>> > Does KStream help me do this? If not, how should I go about solving this
>> > problem?
>> >
>> > Also, I wanted some advice as to whether it is a standard practice to
>> > aggregate like this and *then* store to warehouse, or should I append
>> each
>> > event into the warehouse and do sort-of an ELT on that using the
>> warehouse?
>> > (Run a query to re-structure the data in the database and store it off
>> as a
>> > separate table)
>> >
>> > Eagerly waiting for your reply,
>> > Arvind
>> >
>>
>>
>
>
>

Reply via email to