Jamie, I think you raise a valid concern but I would hesitate to accept the suggestion that the low-level API be promoted to app developers.
Higher-level abstractions tend to be more constrained and more optimized, whereas lower-level abstractions tend to be more powerful, be more laborious to use and provide the system with less knowledge. It is a classic tradeoff. I think it important to consider, what are the important/distinguishing characteristics of the Flink framework. Exactly-once guarantees, event-time support, support for job upgrade without data loss, fault tolerance, etc. I’m speculating that the high-level abstraction provided to app developers is probably needed to retain those charactistics. I think Vasia makes a good point that SQL might be a good alternative way to ease into Flink. Finally, I believe the low-level API is primarily intended for extension purposes (connectors, operations, etc) not app development. It could use better documentation to ensure that third-party extensions support those key characteristics. -Eron > On Aug 16, 2016, at 3:12 AM, Vasiliki Kalavri <vasilikikala...@gmail.com> > wrote: > > Hi Jamie, > > thanks for sharing your thoughts on this! You're raising some interesting > points. > > Whether users find the lower-level primitives more intuitive depends on > their background I believe. From what I've seen, if users are coming from > the S4/Storm world and are used to the "compositional" way of streaming, > then indeed it's easier for them to think and operate on that level. These > are usually people who have seen/built streaming things before trying out > Flink. > But if we're talking about analysts and people coming from the "batch" way > of thinking or people used to working with SQL/python, then the > higher-level declarative API is probably easier to understand. > > I do think that we should make the lower-level API more visible and > document it properly, but I'm not sure if we should teach Flink on this > level first. I think that presenting it as a set of "advanced" features > makes more sense actually. > > Cheers, > -Vasia. > > On 16 August 2016 at 04:24, Jamie Grier <ja...@data-artisans.com> wrote: > >> You lost me at lattice, Aljoscha ;) >> >> I do think something like the more powerful N-way FlatMap w/ Timers >> Aljoscha is describing here would probably solve most of the problem. >> Often Flink's higher level primitives work well for people and that's >> great. It's just that I also spend a fair amount of time discussing with >> people how to map what they know they want to do onto operations that >> aren't a perfect fit and it sometimes liberates them when they realize they >> can just implement it the way they want by dropping down a level. They >> usually don't go there themselves, though. >> >> I mention teaching this "first" and then the higher layers I guess because >> that's just a matter of teaching philosophy. I think it's good to to see >> the basic operations that are available first and then understand that the >> other abstractions are built on top of that. That way you're not afraid to >> drop-down to basics when you know what you want to get done. >> >> -Jamie >> >> >> On Mon, Aug 15, 2016 at 2:11 AM, Aljoscha Krettek <aljos...@apache.org> >> wrote: >> >>> Hi All, >>> I also thought about this recently. A good think would be to add a good >>> user facing operator that behaves more or less like an enhanced FlatMap >>> with multiple inputs, multiple outputs, state access and keyed timers. >> I'm >>> a bit hesitant, though, since users rarely think about the implications >>> that come with state updating and out-of-order events. If you don't >>> implement a stateful operator correctly you have pretty much arbitrary >>> results. >>> >>> The problem with out-of-order event arrival and state update is that the >>> state basically has to monotonically transition "upwards" through a >> lattice >>> for the computation to make sense. I know this sounds rather theoretical >> so >>> I'll try to explain with an example. Say you have an operator that waits >>> for timestamped elements A, B, C to arrive in timestamp order and then >> does >>> some processing. The naive approach would be to have a small state >> machine >>> that tracks what element you have seen so far. The state machine has >> three >>> states: NOTHING, SEEN_A, SEEN_B and SEEN_C. The state machine is supposed >>> to traverse these states linearly as the elements arrive. This doesn't >>> work, however, when elements arrive in an order that does not match their >>> timestamp order. What the user should do is to have a "Set" state that >>> keeps track of the elements that it has seen. Once it has seen {A, B, C} >>> the operator must check the timestamps and then do the processing, if >>> required. The set of possible combinations of A, B, and C forms a lattice >>> when combined with the "subset" operation. And traversal through these >> sets >>> is monotonically "upwards" so it works regardless of the order that the >>> elements arrive in. (I recently pointed this out on the Beam mailing list >>> and Kenneth Knowles rightly pointed out that what I was describing was in >>> fact a lattice.) >>> >>> I know this is a bit off-topic but I think it's very easy for users to >>> write wrong operations when they are dealing with state. We should still >>> have a good API for it, though. Just wanted to make people aware of this. >>> >>> Cheers, >>> Aljoscha >>> >>> On Mon, 15 Aug 2016 at 08:18 Matthias J. Sax <mj...@apache.org> wrote: >>> >>>> It really depends on the skill level of the developer. Using low-level >>>> API requires to think about many details (eg. state handling etc.) that >>>> could be done wrong. >>>> >>>> As Flink gets a broader community, more people will use it who might >> not >>>> have the required skill level to deal with low-level API. For more >>>> trained uses, it is of course a powerful tool! >>>> >>>> I guess it boils down to the question, what type of developer Flink >>>> targets, if low-level API should be offensive advertised or not. Also >>>> keep in mind, that many people criticized Storm's low-level API as hard >>>> to program etc. >>>> >>>> >>>> -Matthias >>>> >>>> On 08/15/2016 07:46 AM, Gyula Fóra wrote: >>>>> Hi Jamie, >>>>> >>>>> I agree that it is often much easier to work on the lower level APIs >> if >>>> you >>>>> know what you are doing. >>>>> >>>>> I think it would be nice to have very clean abstractions on that >> level >>> so >>>>> we could teach this to the users first but currently I thinm its not >>> easy >>>>> enough to be good starting point. >>>>> >>>>> The user needs to understand a lot about the system if the dont want >> to >>>>> hurt other parts of the pipeline. For insance working with the >>>>> streamrecords, propagating watermarks, working with state internals >>>>> >>>>> This all might be overwhelming at the first glance. But maybe we can >>> slim >>>>> some abstractions down to the point where this becomes kind of the >>>>> extension of the RichFunctions. >>>>> >>>>> Cheers, >>>>> Gyula >>>>> >>>>> On Sat, Aug 13, 2016, 17:48 Jamie Grier <ja...@data-artisans.com> >>> wrote: >>>>> >>>>>> Hey all, >>>>>> >>>>>> I've noticed a few times now when trying to help users implement >>>> particular >>>>>> things in the Flink API that it can be complicated to map what they >>> know >>>>>> they are trying to do onto higher-level Flink concepts such as >>>> windowing or >>>>>> Connect/CoFlatMap/ValueState, etc. >>>>>> >>>>>> At some point it just becomes easier to think about writing a Flink >>>>>> operator yourself that is integrated into the pipeline with a >>>> transform() >>>>>> call. >>>>>> >>>>>> It can just be easier to think at a more basic level. For example I >>> can >>>>>> write an operator that can consume one or two input streams (should >>>>>> probably be N), update state which is managed for me fault >> tolerantly, >>>> and >>>>>> output elements or setup timers/triggers that give me callbacks from >>>> which >>>>>> I can also update state or emit elements. >>>>>> >>>>>> When you think at this level you realize you can program just about >>>>>> anything you want. You can create whatever fault-tolerant data >>>> structures >>>>>> you want, and easily execute robust stateful computation over data >>>> streams >>>>>> at scale. This is the real technology and power of Flink IMO. >>>>>> >>>>>> Also, at this level I don't have to think about the complexities of >>>>>> windowing semantics, learn as much API, etc. I can easily have some >>>> inputs >>>>>> that are broadcast, others that are keyed, manage my own state in >>>> whatever >>>>>> data structure makes sense, etc. If I know exactly what I actually >>>> want to >>>>>> do I can just do it with the full power of my chosen language, data >>>>>> structures, etc. I'm not "restricted" to trying to map everything >>> onto >>>>>> higher-level Flink constructs which is sometimes actually more >>>> complicated. >>>>>> >>>>>> Programming at this level is actually fairly easy to do but people >>> seem >>>> a >>>>>> bit afraid of this level of the API. They think of it as low-level >> or >>>>>> custom hacking.. >>>>>> >>>>>> Anyway, I guess my thought is this.. Should we explain Flink to >>> people >>>> at >>>>>> this level *first*? Show that you have nearly unlimited power and >>>>>> flexibility to build what you want *and only then* from there >> explain >>>> the >>>>>> higher level APIs they can use *if* those match their use cases >> well. >>>>>> >>>>>> Would this better demonstrate to people the power of Flink and maybe >>>>>> *liberate* them a bit from feeling they have to map their problem >>> onto a >>>>>> more complex set of higher level primitives? I see people trying to >>>>>> shoe-horn what they are really trying to do, which is simple to >>> explain >>>> in >>>>>> english, onto windows, triggers, CoFlatMaps, etc, and this get's >>>>>> complicated sometimes. It's like an impedance mismatch. You could >>> just >>>>>> solve the problem very easily programmed in straight Java/Scala. >>>>>> >>>>>> Anyway, it's very easy to drop down a level in the API and program >>>> whatever >>>>>> you want but users don't seem to *perceive* it that way. >>>>>> >>>>>> Just some thoughts... Any feedback? Have any of you had similar >>>>>> experiences when working with newer Flink users or as a newer Flink >>> user >>>>>> yourself? Can/should we do anything to make the *lower* level API >>> more >>>>>> accessible/visible to users? >>>>>> >>>>>> -Jamie >>>>>> >>>>> >>>> >>>> >>> >> >> >> >> -- >> >> Jamie Grier >> data Artisans, Director of Applications Engineering >> @jamiegrier <https://twitter.com/jamiegrier> >> ja...@data-artisans.com >>