Jamie, That looks fantastic!
Thanks for the help. David On Tue, Mar 22, 2016 at 6:22 PM, Jamie Grier <ja...@data-artisans.com> wrote: > Hi David, > > Here's an example of something similar to what you're talking about: > https://github.com/jgrier/FilteringExample > > Have a look at the TweetImpressionFilteringJob. > > -Jamie > > > On Tue, Mar 22, 2016 at 2:24 PM, David Brelloch <brell...@gmail.com> > wrote: > >> Konstantin, >> >> Not a problem. Thanks for pointing me in the right direction. >> >> David >> >> On Tue, Mar 22, 2016 at 5:17 PM, Konstantin Knauf < >> konstantin.kn...@tngtech.com> wrote: >> >>> Hi David, >>> >>> interesting use case, I think, this can be nicely done with a comap. Let >>> me know if you run into problems, unfortunately I am not aware of any >>> open source examples. >>> >>> Cheers, >>> >>> Konstnatin >>> >>> On 22.03.2016 21:07, David Brelloch wrote: >>> > Konstantin, >>> > >>> > For now the jobs will largely just involve incrementing or decrementing >>> > based on the json message coming in. We will probably look at adding >>> > windowing later but for now that isn't a huge priority. >>> > >>> > As an example of what we are looking to do lets say the following >>> > 3 message were read from the kafka topic: >>> > {"customerid": 1, "event": "addAdmin"} >>> > {"customerid": 1, "event": "addAdmin"} >>> > {"customerid": 1, "event": "deleteAdmin"} >>> > >>> > If the customer with id of 1 had said they care about that type of >>> > message we would expect to be tracking the number of admins and notify >>> > them that they currently have 2. The events are obviously much more >>> > complicated than that and they are not uniform but that is the general >>> > overview. >>> > >>> > I will take a look at using the comap operator. Do you know of any >>> > examples where it is doing something similar? Quickly looking I am not >>> > seeing it used anywhere outside of tests where it is largely just >>> > unifying the data coming in. >>> > >>> > I think accumulators will at least be a reasonable starting place for >>> us >>> > so thank your for pointing me in that direction. >>> > >>> > Thanks for your help! >>> > >>> > David >>> > >>> > On Tue, Mar 22, 2016 at 3:27 PM, Konstantin Knauf >>> > <konstantin.kn...@tngtech.com <mailto:konstantin.kn...@tngtech.com>> >>> wrote: >>> > >>> > Hi David, >>> > >>> > I have no idea how many parallel jobs are possible in Flink, but >>> > generally speaking I do not think this approach will scale, >>> because you >>> > will always only have one job manager for coordination. But there >>> is >>> > definitely someone on the list, who can tell you more about this. >>> > >>> > Regarding your 2nd question. Could you go into some more details, >>> what >>> > the jobs will do? Without knowing any details, I think a control >>> kafka >>> > topic which contains the "job creation/cancellation requests" of >>> the >>> > users in combination with a comap-operator is the better solution >>> here. >>> > You could keep the currently active "jobs" as state in the comap >>> and and >>> > emit one record of the original stream per active user-job >>> together with >>> > some indicator on how to process it based on the request. What are >>> your >>> > concerns with respect to insight in the process? I think with some >>> nice >>> > accumulators you could get a good idea of what is going on, on the >>> other >>> > hand if I think about monitoring 1000s of jobs I am actually not so >>> > sure ;) >>> > >>> > Cheers, >>> > >>> > Konstantin >>> > >>> > On 22.03.2016 19 <tel:22.03.2016%2019>:16, David Brelloch wrote: >>> > > Hi all, >>> > > >>> > > We are currently evaluating flink for processing kafka messages >>> and are >>> > > running into some issues. The basic problem we are trying to >>> solve is >>> > > allowing our end users to dynamically create jobs to alert based >>> off the >>> > > messages coming from kafka. At launch we figure we need to >>> support at >>> > > least 15,000 jobs (3000 customers with 5 jobs each). I have the >>> example >>> > > kafka job running and it is working great. The questions I have >>> are: >>> > > >>> > > 1. On my local machine (admittedly much less powerful than we >>> > would be >>> > > using in production) things fall apart once I get to around >>> 75 jobs. >>> > > Can flink handle a situation like this where we are looking >>> at >>> > > thousands of jobs? >>> > > 2. Is this approach even the right way to go? Is there a >>> different >>> > > approach that would make more sense? Everything will be >>> listening to >>> > > the same kafka topic so the other thought we had was to have >>> 1 job >>> > > that processed everything and was configured by a separate >>> control >>> > > kafka topic. The concern we had there was we would almost >>> completely >>> > > lose insight into what was going on if there was a slow down. >>> > > 3. The current approach we are using for creating dynamic jobs >>> is >>> > > building a common jar and then starting it with the >>> configuration >>> > > data for the individual job. Does this sound reasonable? >>> > > >>> > > >>> > > If any of these questions are answered elsewhere I apologize. I >>> > couldn't >>> > > find any of this being discussed elsewhere. >>> > > >>> > > Thanks for your help. >>> > > >>> > > David >>> > >>> > -- >>> > Konstantin Knauf * konstantin.kn...@tngtech.com >>> > <mailto:konstantin.kn...@tngtech.com> * +49-174-3413182 >>> > <tel:%2B49-174-3413182> >>> > TNG Technology Consulting GmbH, Betastr. 13a, 85774 Unterföhring >>> > Geschäftsführer: Henrik Klagges, Christoph Stock, Dr. Robert Dahlke >>> > Sitz: Unterföhring * Amtsgericht München * HRB 135082 >>> > >>> > >>> >>> -- >>> Konstantin Knauf * konstantin.kn...@tngtech.com * +49-174-3413182 >>> TNG Technology Consulting GmbH, Betastr. 13a, 85774 Unterföhring >>> Geschäftsführer: Henrik Klagges, Christoph Stock, Dr. Robert Dahlke >>> Sitz: Unterföhring * Amtsgericht München * HRB 135082 >>> >> >> > > > -- > > Jamie Grier > data Artisans, Director of Applications Engineering > @jamiegrier <https://twitter.com/jamiegrier> > ja...@data-artisans.com > >