Hi David, Here's an example of something similar to what you're talking about: https://github.com/jgrier/FilteringExample
Have a look at the TweetImpressionFilteringJob. -Jamie On Tue, Mar 22, 2016 at 2:24 PM, David Brelloch <brell...@gmail.com> wrote: > Konstantin, > > Not a problem. Thanks for pointing me in the right direction. > > David > > On Tue, Mar 22, 2016 at 5:17 PM, Konstantin Knauf < > konstantin.kn...@tngtech.com> wrote: > >> Hi David, >> >> interesting use case, I think, this can be nicely done with a comap. Let >> me know if you run into problems, unfortunately I am not aware of any >> open source examples. >> >> Cheers, >> >> Konstnatin >> >> On 22.03.2016 21:07, David Brelloch wrote: >> > Konstantin, >> > >> > For now the jobs will largely just involve incrementing or decrementing >> > based on the json message coming in. We will probably look at adding >> > windowing later but for now that isn't a huge priority. >> > >> > As an example of what we are looking to do lets say the following >> > 3 message were read from the kafka topic: >> > {"customerid": 1, "event": "addAdmin"} >> > {"customerid": 1, "event": "addAdmin"} >> > {"customerid": 1, "event": "deleteAdmin"} >> > >> > If the customer with id of 1 had said they care about that type of >> > message we would expect to be tracking the number of admins and notify >> > them that they currently have 2. The events are obviously much more >> > complicated than that and they are not uniform but that is the general >> > overview. >> > >> > I will take a look at using the comap operator. Do you know of any >> > examples where it is doing something similar? Quickly looking I am not >> > seeing it used anywhere outside of tests where it is largely just >> > unifying the data coming in. >> > >> > I think accumulators will at least be a reasonable starting place for us >> > so thank your for pointing me in that direction. >> > >> > Thanks for your help! >> > >> > David >> > >> > On Tue, Mar 22, 2016 at 3:27 PM, Konstantin Knauf >> > <konstantin.kn...@tngtech.com <mailto:konstantin.kn...@tngtech.com>> >> wrote: >> > >> > Hi David, >> > >> > I have no idea how many parallel jobs are possible in Flink, but >> > generally speaking I do not think this approach will scale, because >> you >> > will always only have one job manager for coordination. But there is >> > definitely someone on the list, who can tell you more about this. >> > >> > Regarding your 2nd question. Could you go into some more details, >> what >> > the jobs will do? Without knowing any details, I think a control >> kafka >> > topic which contains the "job creation/cancellation requests" of the >> > users in combination with a comap-operator is the better solution >> here. >> > You could keep the currently active "jobs" as state in the comap >> and and >> > emit one record of the original stream per active user-job together >> with >> > some indicator on how to process it based on the request. What are >> your >> > concerns with respect to insight in the process? I think with some >> nice >> > accumulators you could get a good idea of what is going on, on the >> other >> > hand if I think about monitoring 1000s of jobs I am actually not so >> > sure ;) >> > >> > Cheers, >> > >> > Konstantin >> > >> > On 22.03.2016 19 <tel:22.03.2016%2019>:16, David Brelloch wrote: >> > > Hi all, >> > > >> > > We are currently evaluating flink for processing kafka messages >> and are >> > > running into some issues. The basic problem we are trying to >> solve is >> > > allowing our end users to dynamically create jobs to alert based >> off the >> > > messages coming from kafka. At launch we figure we need to >> support at >> > > least 15,000 jobs (3000 customers with 5 jobs each). I have the >> example >> > > kafka job running and it is working great. The questions I have >> are: >> > > >> > > 1. On my local machine (admittedly much less powerful than we >> > would be >> > > using in production) things fall apart once I get to around >> 75 jobs. >> > > Can flink handle a situation like this where we are looking at >> > > thousands of jobs? >> > > 2. Is this approach even the right way to go? Is there a >> different >> > > approach that would make more sense? Everything will be >> listening to >> > > the same kafka topic so the other thought we had was to have >> 1 job >> > > that processed everything and was configured by a separate >> control >> > > kafka topic. The concern we had there was we would almost >> completely >> > > lose insight into what was going on if there was a slow down. >> > > 3. The current approach we are using for creating dynamic jobs is >> > > building a common jar and then starting it with the >> configuration >> > > data for the individual job. Does this sound reasonable? >> > > >> > > >> > > If any of these questions are answered elsewhere I apologize. I >> > couldn't >> > > find any of this being discussed elsewhere. >> > > >> > > Thanks for your help. >> > > >> > > David >> > >> > -- >> > Konstantin Knauf * konstantin.kn...@tngtech.com >> > <mailto:konstantin.kn...@tngtech.com> * +49-174-3413182 >> > <tel:%2B49-174-3413182> >> > TNG Technology Consulting GmbH, Betastr. 13a, 85774 Unterföhring >> > Geschäftsführer: Henrik Klagges, Christoph Stock, Dr. Robert Dahlke >> > Sitz: Unterföhring * Amtsgericht München * HRB 135082 >> > >> > >> >> -- >> Konstantin Knauf * konstantin.kn...@tngtech.com * +49-174-3413182 >> TNG Technology Consulting GmbH, Betastr. 13a, 85774 Unterföhring >> Geschäftsführer: Henrik Klagges, Christoph Stock, Dr. Robert Dahlke >> Sitz: Unterföhring * Amtsgericht München * HRB 135082 >> > > -- Jamie Grier data Artisans, Director of Applications Engineering @jamiegrier <https://twitter.com/jamiegrier> ja...@data-artisans.com