Milinda,

This is a stream of events where I dont know how many applications are
sending events. I need to dynamically create Kafka partitions.
Can you please confirm the flow:
1. New event comes in
2. Check to see if a partition exists for the application. If not create
one.
3. Implement public static final SystemStream OUTPUT_STREAM =

    new SystemStream("kafka", "Application name");

4. Apply windowing on the new topic

public void window(MessageCollector collector,
                     TaskCoordinator coordinator) {
    collector.send(new OutgoingMessageEnvelope(OUTPUT_STREAM, eventsSeen));
    eventsSeen = 0;
  }


On Mon, Jun 29, 2015 at 7:54 AM, Milinda Pathirage <mpath...@umail.iu.edu>
wrote:

> Hi Shekar,
>
> You can use Kafka's partitioning capabilities to partition your stream
> based on application. That will make sure events related to a application
> will always ended up in same partition. With this you will have multiple
> applications in same partition and each partition will be mapped to a
> single Samza task instance (AFAIK, Samza job is devided into several task
> instances based on number of partitions in your topic.). Then in your Samza
> task implementation you should maintain windowing counts for each
> application.
>
> Thanks
> Milinda
>
> On Sun, Jun 28, 2015 at 8:48 PM, Shekar Tippur <ctip...@gmail.com> wrote:
>
> > Milinda,
> >
> > I see that the document you mentioned addresses windowing but I also need
> > to group by different applications.
> >
> > Application    Count
> > ---------------    --------
> > A                    100
> > B                    40
> > C                    69
> > ....
> >
> > - Shekar
> >
> > On Fri, Jun 26, 2015 at 11:39 AM, Shekar Tippur <ctip...@gmail.com>
> wrote:
> >
> > > Never mind. I see it here:
> > >
> > >
> http://samza.apache.org/learn/documentation/0.8/container/windowing.html
> > >
> > > Thanks again Milinda.
> > >
> > > - Shekar
> > >
> > > On Fri, Jun 26, 2015 at 11:39 AM, Shekar Tippur <ctip...@gmail.com>
> > wrote:
> > >
> > >> Thanks Milinda.
> > >> Is this feature available on 0.8 version of Samza?
> > >>
> > >> - Shekar
> > >>
> > >> On Fri, Jun 26, 2015 at 11:24 AM, Milinda Pathirage <
> > >> mpath...@umail.iu.edu> wrote:
> > >>
> > >>> Hi Shekar,
> > >>>
> > >>> You can use Samza's local storage (
> > >>>
> > >>>
> >
> http://samza.apache.org/learn/documentation/0.9/container/state-management.html
> > >>> )
> > >>> to keep the window state and windowing (
> > >>>
> > http://samza.apache.org/learn/documentation/0.9/container/windowing.html
> > >>> )
> > >>> capabilities to handle the window advancement. During advancement you
> > can
> > >>> update the local cache (Redis in your case). AFAIK, Samza doesn't
> > provide
> > >>> any helpers or utilities to handle window state maintenance. You have
> > to
> > >>> implement it on top of local storage or if you don't won't fault
> > >>> tolerance
> > >>> you can keep the state in-memory too (as long as the state fit in
> > >>> memory).
> > >>>
> > >>> Thanks
> > >>> Milinda
> > >>>
> > >>> On Fri, Jun 26, 2015 at 1:53 PM, Shekar Tippur <ctip...@gmail.com>
> > >>> wrote:
> > >>>
> > >>> > Yan,
> > >>> >
> > >>> >
> > >>> > *What do you mean by "a local cache"? Is it a db like MySQL,
> > something
> > >>> > likeRocksDB, or even just in-memory?*
> > >>> >
> > >>> > Local cache as in Redis
> > >>> >
> > >>> >
> > >>> >
> > >>> > *When you say "another topic", is this the topic consumed by the
> same
> > >>> > Samzajob as your 5-minutes-job, or in a separate job? What is the
> > >>> > relationbetween the topic and the application name*
> > >>> >
> > >>> > We dont have a 5 min job. All we have now is a stream of events
> > coming
> > >>> from
> > >>> > a bunch of applications. All these land on a raw kafka topic. The
> > >>> stream
> > >>> > data has application name. I want to create a job that takes
> incoming
> > >>> > stream and group it by application name and count the number of
> > events
> > >>> we
> > >>> > get in a 5 min sliding window.
> > >>> >
> > >>> > - Shekar
> > >>> >
> > >>> > On Fri, Jun 26, 2015 at 10:29 AM, Yan Fang <yanfang...@gmail.com>
> > >>> wrote:
> > >>> >
> > >>> > > Hi Shekar,
> > >>> > >
> > >>> > > Need a little more clarification.
> > >>> > >
> > >>> > > What do you mean by "a local cache"? Is it a db like MySQL,
> > something
> > >>> > like
> > >>> > > RocksDB, or even just in-memory?
> > >>> > >
> > >>> > > When you say "another topic", is this the topic consumed by the
> > same
> > >>> > Samza
> > >>> > > job as your 5-minutes-job, or in a separate job? What is the
> > relation
> > >>> > > between the topic and the application name?
> > >>> > >
> > >>> > > Thanks,
> > >>> > >
> > >>> > > Fang, Yan
> > >>> > > yanfang...@gmail.com
> > >>> > >
> > >>> > > On Fri, Jun 26, 2015 at 1:08 AM, Shekar Tippur <
> ctip...@gmail.com>
> > >>> > wrote:
> > >>> > >
> > >>> > > > Hello,
> > >>> > > > My apologies if I have raised it earlier.
> > >>> > > > Here is the use case:
> > >>> > > > I have a stream that is partitioned based on application name.
> I
> > >>> want
> > >>> > to
> > >>> > > be
> > >>> > > > able to count hte number of events happening for that
> particular
> > >>> > > > application in the past 5 minutes (sliding window) and update
> > >>> either
> > >>> > > > another topic or a local cache.
> > >>> > > >
> > >>> > > > Is this possible via 0.9 version of Samza?
> > >>> > > > If not, what is the easiest way to achieve this?
> > >>> > > >
> > >>> > > > - Shekar
> > >>> > > >
> > >>> > >
> > >>> >
> > >>>
> > >>>
> > >>>
> > >>> --
> > >>> Milinda Pathirage
> > >>>
> > >>> PhD Student | Research Assistant
> > >>> School of Informatics and Computing | Data to Insight Center
> > >>> Indiana University
> > >>>
> > >>> twitter: milindalakmal
> > >>> skype: milinda.pathirage
> > >>> blog: http://milinda.pathirage.org
> > >>>
> > >>
> > >>
> > >
> >
>
>
>
> --
> Milinda Pathirage
>
> PhD Student | Research Assistant
> School of Informatics and Computing | Data to Insight Center
> Indiana University
>
> twitter: milindalakmal
> skype: milinda.pathirage
> blog: http://milinda.pathirage.org
>

Reply via email to