Shekar,

You won't be creating a partition per application. By using the application
name as the partitioning key you ensure all events for a given application
are consistently mapped to the same partition. Multiple applications will
be mapped to each partition without any need for a priori knowledge of the
set of applications.

You then maintain counts per application in your own code. Since all events
for an application are mapped to the same partition and only one worker
will consume a given partition your counts will be complete.

Make sense?


b

On Monday, June 29, 2015, Shekar Tippur <ctip...@gmail.com> wrote:

> Milinda,
>
> This is a stream of events where I dont know how many applications are
> sending events. I need to dynamically create Kafka partitions.
> Can you please confirm the flow:
> 1. New event comes in
> 2. Check to see if a partition exists for the application. If not create
> one.
> 3. Implement public static final SystemStream OUTPUT_STREAM =
>
>     new SystemStream("kafka", "Application name");
>
> 4. Apply windowing on the new topic
>
> public void window(MessageCollector collector,
>                      TaskCoordinator coordinator) {
>     collector.send(new OutgoingMessageEnvelope(OUTPUT_STREAM, eventsSeen));
>     eventsSeen = 0;
>   }
>
>
> On Mon, Jun 29, 2015 at 7:54 AM, Milinda Pathirage <mpath...@umail.iu.edu
> <javascript:;>>
> wrote:
>
> > Hi Shekar,
> >
> > You can use Kafka's partitioning capabilities to partition your stream
> > based on application. That will make sure events related to a application
> > will always ended up in same partition. With this you will have multiple
> > applications in same partition and each partition will be mapped to a
> > single Samza task instance (AFAIK, Samza job is devided into several task
> > instances based on number of partitions in your topic.). Then in your
> Samza
> > task implementation you should maintain windowing counts for each
> > application.
> >
> > Thanks
> > Milinda
> >
> > On Sun, Jun 28, 2015 at 8:48 PM, Shekar Tippur <ctip...@gmail.com
> <javascript:;>> wrote:
> >
> > > Milinda,
> > >
> > > I see that the document you mentioned addresses windowing but I also
> need
> > > to group by different applications.
> > >
> > > Application    Count
> > > ---------------    --------
> > > A                    100
> > > B                    40
> > > C                    69
> > > ....
> > >
> > > - Shekar
> > >
> > > On Fri, Jun 26, 2015 at 11:39 AM, Shekar Tippur <ctip...@gmail.com
> <javascript:;>>
> > wrote:
> > >
> > > > Never mind. I see it here:
> > > >
> > > >
> > http://samza.apache.org/learn/documentation/0.8/container/windowing.html
> > > >
> > > > Thanks again Milinda.
> > > >
> > > > - Shekar
> > > >
> > > > On Fri, Jun 26, 2015 at 11:39 AM, Shekar Tippur <ctip...@gmail.com
> <javascript:;>>
> > > wrote:
> > > >
> > > >> Thanks Milinda.
> > > >> Is this feature available on 0.8 version of Samza?
> > > >>
> > > >> - Shekar
> > > >>
> > > >> On Fri, Jun 26, 2015 at 11:24 AM, Milinda Pathirage <
> > > >> mpath...@umail.iu.edu <javascript:;>> wrote:
> > > >>
> > > >>> Hi Shekar,
> > > >>>
> > > >>> You can use Samza's local storage (
> > > >>>
> > > >>>
> > >
> >
> http://samza.apache.org/learn/documentation/0.9/container/state-management.html
> > > >>> )
> > > >>> to keep the window state and windowing (
> > > >>>
> > >
> http://samza.apache.org/learn/documentation/0.9/container/windowing.html
> > > >>> )
> > > >>> capabilities to handle the window advancement. During advancement
> you
> > > can
> > > >>> update the local cache (Redis in your case). AFAIK, Samza doesn't
> > > provide
> > > >>> any helpers or utilities to handle window state maintenance. You
> have
> > > to
> > > >>> implement it on top of local storage or if you don't won't fault
> > > >>> tolerance
> > > >>> you can keep the state in-memory too (as long as the state fit in
> > > >>> memory).
> > > >>>
> > > >>> Thanks
> > > >>> Milinda
> > > >>>
> > > >>> On Fri, Jun 26, 2015 at 1:53 PM, Shekar Tippur <ctip...@gmail.com
> <javascript:;>>
> > > >>> wrote:
> > > >>>
> > > >>> > Yan,
> > > >>> >
> > > >>> >
> > > >>> > *What do you mean by "a local cache"? Is it a db like MySQL,
> > > something
> > > >>> > likeRocksDB, or even just in-memory?*
> > > >>> >
> > > >>> > Local cache as in Redis
> > > >>> >
> > > >>> >
> > > >>> >
> > > >>> > *When you say "another topic", is this the topic consumed by the
> > same
> > > >>> > Samzajob as your 5-minutes-job, or in a separate job? What is the
> > > >>> > relationbetween the topic and the application name*
> > > >>> >
> > > >>> > We dont have a 5 min job. All we have now is a stream of events
> > > coming
> > > >>> from
> > > >>> > a bunch of applications. All these land on a raw kafka topic. The
> > > >>> stream
> > > >>> > data has application name. I want to create a job that takes
> > incoming
> > > >>> > stream and group it by application name and count the number of
> > > events
> > > >>> we
> > > >>> > get in a 5 min sliding window.
> > > >>> >
> > > >>> > - Shekar
> > > >>> >
> > > >>> > On Fri, Jun 26, 2015 at 10:29 AM, Yan Fang <yanfang...@gmail.com
> <javascript:;>>
> > > >>> wrote:
> > > >>> >
> > > >>> > > Hi Shekar,
> > > >>> > >
> > > >>> > > Need a little more clarification.
> > > >>> > >
> > > >>> > > What do you mean by "a local cache"? Is it a db like MySQL,
> > > something
> > > >>> > like
> > > >>> > > RocksDB, or even just in-memory?
> > > >>> > >
> > > >>> > > When you say "another topic", is this the topic consumed by the
> > > same
> > > >>> > Samza
> > > >>> > > job as your 5-minutes-job, or in a separate job? What is the
> > > relation
> > > >>> > > between the topic and the application name?
> > > >>> > >
> > > >>> > > Thanks,
> > > >>> > >
> > > >>> > > Fang, Yan
> > > >>> > > yanfang...@gmail.com <javascript:;>
> > > >>> > >
> > > >>> > > On Fri, Jun 26, 2015 at 1:08 AM, Shekar Tippur <
> > ctip...@gmail.com <javascript:;>>
> > > >>> > wrote:
> > > >>> > >
> > > >>> > > > Hello,
> > > >>> > > > My apologies if I have raised it earlier.
> > > >>> > > > Here is the use case:
> > > >>> > > > I have a stream that is partitioned based on application
> name.
> > I
> > > >>> want
> > > >>> > to
> > > >>> > > be
> > > >>> > > > able to count hte number of events happening for that
> > particular
> > > >>> > > > application in the past 5 minutes (sliding window) and update
> > > >>> either
> > > >>> > > > another topic or a local cache.
> > > >>> > > >
> > > >>> > > > Is this possible via 0.9 version of Samza?
> > > >>> > > > If not, what is the easiest way to achieve this?
> > > >>> > > >
> > > >>> > > > - Shekar
> > > >>> > > >
> > > >>> > >
> > > >>> >
> > > >>>
> > > >>>
> > > >>>
> > > >>> --
> > > >>> Milinda Pathirage
> > > >>>
> > > >>> PhD Student | Research Assistant
> > > >>> School of Informatics and Computing | Data to Insight Center
> > > >>> Indiana University
> > > >>>
> > > >>> twitter: milindalakmal
> > > >>> skype: milinda.pathirage
> > > >>> blog: http://milinda.pathirage.org
> > > >>>
> > > >>
> > > >>
> > > >
> > >
> >
> >
> >
> > --
> > Milinda Pathirage
> >
> > PhD Student | Research Assistant
> > School of Informatics and Computing | Data to Insight Center
> > Indiana University
> >
> > twitter: milindalakmal
> > skype: milinda.pathirage
> > blog: http://milinda.pathirage.org
> >
>

Reply via email to