Milinda, This is a stream of events where I dont know how many applications are sending events. I need to dynamically create Kafka partitions. Can you please confirm the flow: 1. New event comes in 2. Check to see if a partition exists for the application. If not create one. 3. Implement public static final SystemStream OUTPUT_STREAM =
new SystemStream("kafka", "Application name"); 4. Apply windowing on the new topic public void window(MessageCollector collector, TaskCoordinator coordinator) { collector.send(new OutgoingMessageEnvelope(OUTPUT_STREAM, eventsSeen)); eventsSeen = 0; } On Mon, Jun 29, 2015 at 7:54 AM, Milinda Pathirage <mpath...@umail.iu.edu> wrote: > Hi Shekar, > > You can use Kafka's partitioning capabilities to partition your stream > based on application. That will make sure events related to a application > will always ended up in same partition. With this you will have multiple > applications in same partition and each partition will be mapped to a > single Samza task instance (AFAIK, Samza job is devided into several task > instances based on number of partitions in your topic.). Then in your Samza > task implementation you should maintain windowing counts for each > application. > > Thanks > Milinda > > On Sun, Jun 28, 2015 at 8:48 PM, Shekar Tippur <ctip...@gmail.com> wrote: > > > Milinda, > > > > I see that the document you mentioned addresses windowing but I also need > > to group by different applications. > > > > Application Count > > --------------- -------- > > A 100 > > B 40 > > C 69 > > .... > > > > - Shekar > > > > On Fri, Jun 26, 2015 at 11:39 AM, Shekar Tippur <ctip...@gmail.com> > wrote: > > > > > Never mind. I see it here: > > > > > > > http://samza.apache.org/learn/documentation/0.8/container/windowing.html > > > > > > Thanks again Milinda. > > > > > > - Shekar > > > > > > On Fri, Jun 26, 2015 at 11:39 AM, Shekar Tippur <ctip...@gmail.com> > > wrote: > > > > > >> Thanks Milinda. > > >> Is this feature available on 0.8 version of Samza? > > >> > > >> - Shekar > > >> > > >> On Fri, Jun 26, 2015 at 11:24 AM, Milinda Pathirage < > > >> mpath...@umail.iu.edu> wrote: > > >> > > >>> Hi Shekar, > > >>> > > >>> You can use Samza's local storage ( > > >>> > > >>> > > > http://samza.apache.org/learn/documentation/0.9/container/state-management.html > > >>> ) > > >>> to keep the window state and windowing ( > > >>> > > http://samza.apache.org/learn/documentation/0.9/container/windowing.html > > >>> ) > > >>> capabilities to handle the window advancement. During advancement you > > can > > >>> update the local cache (Redis in your case). AFAIK, Samza doesn't > > provide > > >>> any helpers or utilities to handle window state maintenance. You have > > to > > >>> implement it on top of local storage or if you don't won't fault > > >>> tolerance > > >>> you can keep the state in-memory too (as long as the state fit in > > >>> memory). > > >>> > > >>> Thanks > > >>> Milinda > > >>> > > >>> On Fri, Jun 26, 2015 at 1:53 PM, Shekar Tippur <ctip...@gmail.com> > > >>> wrote: > > >>> > > >>> > Yan, > > >>> > > > >>> > > > >>> > *What do you mean by "a local cache"? Is it a db like MySQL, > > something > > >>> > likeRocksDB, or even just in-memory?* > > >>> > > > >>> > Local cache as in Redis > > >>> > > > >>> > > > >>> > > > >>> > *When you say "another topic", is this the topic consumed by the > same > > >>> > Samzajob as your 5-minutes-job, or in a separate job? What is the > > >>> > relationbetween the topic and the application name* > > >>> > > > >>> > We dont have a 5 min job. All we have now is a stream of events > > coming > > >>> from > > >>> > a bunch of applications. All these land on a raw kafka topic. The > > >>> stream > > >>> > data has application name. I want to create a job that takes > incoming > > >>> > stream and group it by application name and count the number of > > events > > >>> we > > >>> > get in a 5 min sliding window. > > >>> > > > >>> > - Shekar > > >>> > > > >>> > On Fri, Jun 26, 2015 at 10:29 AM, Yan Fang <yanfang...@gmail.com> > > >>> wrote: > > >>> > > > >>> > > Hi Shekar, > > >>> > > > > >>> > > Need a little more clarification. > > >>> > > > > >>> > > What do you mean by "a local cache"? Is it a db like MySQL, > > something > > >>> > like > > >>> > > RocksDB, or even just in-memory? > > >>> > > > > >>> > > When you say "another topic", is this the topic consumed by the > > same > > >>> > Samza > > >>> > > job as your 5-minutes-job, or in a separate job? What is the > > relation > > >>> > > between the topic and the application name? > > >>> > > > > >>> > > Thanks, > > >>> > > > > >>> > > Fang, Yan > > >>> > > yanfang...@gmail.com > > >>> > > > > >>> > > On Fri, Jun 26, 2015 at 1:08 AM, Shekar Tippur < > ctip...@gmail.com> > > >>> > wrote: > > >>> > > > > >>> > > > Hello, > > >>> > > > My apologies if I have raised it earlier. > > >>> > > > Here is the use case: > > >>> > > > I have a stream that is partitioned based on application name. > I > > >>> want > > >>> > to > > >>> > > be > > >>> > > > able to count hte number of events happening for that > particular > > >>> > > > application in the past 5 minutes (sliding window) and update > > >>> either > > >>> > > > another topic or a local cache. > > >>> > > > > > >>> > > > Is this possible via 0.9 version of Samza? > > >>> > > > If not, what is the easiest way to achieve this? > > >>> > > > > > >>> > > > - Shekar > > >>> > > > > > >>> > > > > >>> > > > >>> > > >>> > > >>> > > >>> -- > > >>> Milinda Pathirage > > >>> > > >>> PhD Student | Research Assistant > > >>> School of Informatics and Computing | Data to Insight Center > > >>> Indiana University > > >>> > > >>> twitter: milindalakmal > > >>> skype: milinda.pathirage > > >>> blog: http://milinda.pathirage.org > > >>> > > >> > > >> > > > > > > > > > -- > Milinda Pathirage > > PhD Student | Research Assistant > School of Informatics and Computing | Data to Insight Center > Indiana University > > twitter: milindalakmal > skype: milinda.pathirage > blog: http://milinda.pathirage.org >