PS: Arvind, I think typically such questions should be sent to the
kafka-user mailing list (not to kafka-dev).

On Mon, Sep 26, 2016 at 3:41 AM, Matthias J. Sax <matth...@confluent.io>
wrote:

> Hi Arvind,
>
> short answer, Kafka Streams does definitely help you!
>
> Long answer, Kafka Streams offers two layer to program your stream
> processing job. The low-level Processor API and the high level DSL.
> Please check the documentation to get further details:
> http://docs.confluent.io/3.0.1/streams/index.html
>
> With Processor API you are able to do anything -- on the cost of lower
> abstraction and thus more coding. I guess, this would be there best way
> for your use case to program your Kafka Streams application.
>
> The DSL is easier to use and provides high level abstractions --
> however, I am not sure if it covers what you need in your use case. But
> maybe it's worth to try it out before using Processor API...
>
> For your second question, I would recommend to use Processor API an
> attach a state store to a processor node and write to your data
> warehouse whenever a state is "complete" (see
> http://docs.confluent.io/3.0.1/streams/developer-guide.
> html#defining-a-state-store).
>
> One more hint: you can actually mix DSL and Processor API by using (eg.
> process() or transform() within DSL).
>
>
> Hope this gives you some initial pointers. Please follow up if you have
> more questions.
>
> -Matthias
>
>
> On 09/24/2016 11:25 PM, Arvind Kalyan wrote:
> > Hello there!
> >
> > I read about Kafka Streams recently, pretty interesting the way it solves
> > the stream processing problem in a more cleaner way with less overheads
> and
> > complexities.
> >
> > I work as a Software Engineer in a startup, and we are in the design
> stage
> > for building a stream processing pipeline (if you will) for the millions
> of
> > events we get every day. We use Kafka (cluster) as the log aggregation
> > layer already in production a 5-6 months back and very happy about it.
> >
> > I went through a few confluent blogs (by Jay, Neha) as to how KStreams
> > solve for sort of a state-ful event processing, and maybe I missed the
> > whole point in this regard, I have some doubts.
> >
> > We have use-cases like the following:
> >
> > There is an event E1, which is sort-of the base event after which we
> have a
> > lot of sub- events E2,E3..En enriching E1 with lot of extra properties
> > (with considerable delay, say 30-40 mins).
> >
> > Eg. 1: An order event has come in where the user has ordered an item on
> our
> > website (This is the base event). After say 30-40 minutes, we get events
> > like packaging_time, shipping_time, delivered_time or cancelled_time etc
> > related to that order (These are the sub-events).
> >
> > So before we get the whole event to a warehouse, we need to collect all
> > these (ordered, packaged, shipped, cancelled/delivered), and whenever I
> get
> > a cancelled or delivered event for an order, I know that completes the
> > lifecycle for that order, and can put it in the warehouse.
> >
> > Eg. 2: User login events - If we are to capture events like
> User-Logged-In,
> > User-Logged-Out, I need it to be in the warehouse as a single row. Eg.
> row
> > would have these columns *user_id, login_time, logout_time*. So as and
> when
> > I receive a logout event (and if I have login event stored in some
> store),
> > there would be a trigger which combines both, and send it across to the
> > warehouse.
> >
> > All these involve storing the state of the events and act as-and-when
> > another event (that completes lifecycle) occurs, after which you have a
> > trigger for further steps (warehouse or anything else).
> >
> > Does KStream help me do this? If not, how should I go about solving this
> > problem?
> >
> > Also, I wanted some advice as to whether it is a standard practice to
> > aggregate like this and *then* store to warehouse, or should I append
> each
> > event into the warehouse and do sort-of an ELT on that using the
> warehouse?
> > (Run a query to re-structure the data in the database and store it off
> as a
> > separate table)
> >
> > Eagerly waiting for your reply,
> > Arvind
> >
>
>

Reply via email to