PS: Arvind, I think typically such questions should be sent to the kafka-user mailing list (not to kafka-dev).
On Mon, Sep 26, 2016 at 3:41 AM, Matthias J. Sax <matth...@confluent.io> wrote: > Hi Arvind, > > short answer, Kafka Streams does definitely help you! > > Long answer, Kafka Streams offers two layer to program your stream > processing job. The low-level Processor API and the high level DSL. > Please check the documentation to get further details: > http://docs.confluent.io/3.0.1/streams/index.html > > With Processor API you are able to do anything -- on the cost of lower > abstraction and thus more coding. I guess, this would be there best way > for your use case to program your Kafka Streams application. > > The DSL is easier to use and provides high level abstractions -- > however, I am not sure if it covers what you need in your use case. But > maybe it's worth to try it out before using Processor API... > > For your second question, I would recommend to use Processor API an > attach a state store to a processor node and write to your data > warehouse whenever a state is "complete" (see > http://docs.confluent.io/3.0.1/streams/developer-guide. > html#defining-a-state-store). > > One more hint: you can actually mix DSL and Processor API by using (eg. > process() or transform() within DSL). > > > Hope this gives you some initial pointers. Please follow up if you have > more questions. > > -Matthias > > > On 09/24/2016 11:25 PM, Arvind Kalyan wrote: > > Hello there! > > > > I read about Kafka Streams recently, pretty interesting the way it solves > > the stream processing problem in a more cleaner way with less overheads > and > > complexities. > > > > I work as a Software Engineer in a startup, and we are in the design > stage > > for building a stream processing pipeline (if you will) for the millions > of > > events we get every day. We use Kafka (cluster) as the log aggregation > > layer already in production a 5-6 months back and very happy about it. > > > > I went through a few confluent blogs (by Jay, Neha) as to how KStreams > > solve for sort of a state-ful event processing, and maybe I missed the > > whole point in this regard, I have some doubts. > > > > We have use-cases like the following: > > > > There is an event E1, which is sort-of the base event after which we > have a > > lot of sub- events E2,E3..En enriching E1 with lot of extra properties > > (with considerable delay, say 30-40 mins). > > > > Eg. 1: An order event has come in where the user has ordered an item on > our > > website (This is the base event). After say 30-40 minutes, we get events > > like packaging_time, shipping_time, delivered_time or cancelled_time etc > > related to that order (These are the sub-events). > > > > So before we get the whole event to a warehouse, we need to collect all > > these (ordered, packaged, shipped, cancelled/delivered), and whenever I > get > > a cancelled or delivered event for an order, I know that completes the > > lifecycle for that order, and can put it in the warehouse. > > > > Eg. 2: User login events - If we are to capture events like > User-Logged-In, > > User-Logged-Out, I need it to be in the warehouse as a single row. Eg. > row > > would have these columns *user_id, login_time, logout_time*. So as and > when > > I receive a logout event (and if I have login event stored in some > store), > > there would be a trigger which combines both, and send it across to the > > warehouse. > > > > All these involve storing the state of the events and act as-and-when > > another event (that completes lifecycle) occurs, after which you have a > > trigger for further steps (warehouse or anything else). > > > > Does KStream help me do this? If not, how should I go about solving this > > problem? > > > > Also, I wanted some advice as to whether it is a standard practice to > > aggregate like this and *then* store to warehouse, or should I append > each > > event into the warehouse and do sort-of an ELT on that using the > warehouse? > > (Run a query to re-structure the data in the database and store it off > as a > > separate table) > > > > Eagerly waiting for your reply, > > Arvind > > > >