Some thoughts about the lower-level Flink APIs

Jamie Grier Sat, 13 Aug 2016 08:48:28 -0700

Hey all,

I've noticed a few times now when trying to help users implement particular
things in the Flink API that it can be complicated to map what they know
they are trying to do onto higher-level Flink concepts such as windowing or
Connect/CoFlatMap/ValueState, etc.


At some point it just becomes easier to think about writing a Flink
operator yourself that is integrated into the pipeline with a transform()
call.

It can just be easier to think at a more basic level.  For example I can
write an operator that can consume one or two input streams (should
probably be N), update state which is managed for me fault tolerantly, and
output elements or setup timers/triggers that give me callbacks from which
I can also update state or emit elements.

When you think at this level you realize you can program just about
anything you want.  You can create whatever fault-tolerant data structures
you want, and easily execute robust stateful computation over data streams
at scale.  This is the real technology and power of Flink IMO.

Also, at this level I don't have to think about the complexities of
windowing semantics, learn as much API, etc.  I can easily have some inputs
that are broadcast, others that are keyed, manage my own state in whatever
data structure makes sense, etc.  If I know exactly what I actually want to
do I can just do it with the full power of my chosen language, data
structures, etc.  I'm not "restricted" to trying to map everything onto
higher-level Flink constructs which is sometimes actually more complicated.

Programming at this level is actually fairly easy to do but people seem a
bit afraid of this level of the API.  They think of it as low-level or
custom hacking..

Anyway, I guess my thought is this..  Should we explain Flink to people at
this level *first*?  Show that you have nearly unlimited power and
flexibility to build what you want *and only then* from there explain the
higher level APIs they can use *if* those match their use cases well.

Would this better demonstrate to people the power of Flink and maybe
*liberate* them a bit from feeling they have to map their problem onto a
more complex set of higher level primitives?  I see people trying to
shoe-horn what they are really trying to do, which is simple to explain in
english, onto windows, triggers, CoFlatMaps, etc, and this get's
complicated sometimes.  It's like an impedance mismatch.  You could just
solve the problem very easily programmed in straight Java/Scala.

Anyway, it's very easy to drop down a level in the API and program whatever
you want but users don't seem to *perceive* it that way.

Just some thoughts...  Any feedback?  Have any of you had similar
experiences when working with newer Flink users or as a newer Flink user
yourself?  Can/should we do anything to make the *lower* level API more
accessible/visible to users?

-Jamie

Some thoughts about the lower-level Flink APIs

Reply via email to