Re: Using Flink in an university course

Addison Higham Mon, 04 Mar 2019 11:39:57 -0800

Hi there,

As far as a runtime for students, it seems like docker is your best bet.
However, you could have them instead package a jar using some interface
(for example, see
https://ci.apache.org/projects/flink/flink-docs-release-1.7/dev/packaging.html,
which details the `Program` interface) and then execute it inside a custom
runner. That *might* result in something less prone to breakage as it would
need to conform to an interface, but it may require a fair amount of custom
code to reduce the boiler plate to build up a program plan as well as the
custom runner. The code for how flink loads a jar and turns it into
something it can execute is mostly encapsulated
in org.apache.flink.client.program.PackagedProgram, which might be a good
thing to read and understand if you go down this route.


If you want to give more insight, you could build some tooling to traverse
the underlying graphs that the students build up in their data stream
application. For example, calling
`StreamExecutionEnvironment.getStreamGraph` after the data stream is built
will get a graph of the current job, which you can then use to traverse a
graph and see which operators and edges are in use. This is very similar to
the process flink uses to build the job DAG it renders in the UI. I am not
sure what you could do as an automated analysis, but the StreamGraph API is
quite low level and exposes a lot of information about the program.

Hopefully that is a little bit helpful. Good luck and sounds like a fun
course!


On Mon, Mar 4, 2019 at 7:16 AM Wouter Zorgdrager <w.d.zorgdra...@tudelft.nl>
wrote:

> Hey all,
>
> Thanks for the replies. The issues we were running into (which are not
> specific to Docker):
> - Students changing the template wrongly failed the container.
> - We give full points if the output matches our solutions (and none
> otherwise), but it would be nice if we could give partial grades per
> assignment (and better feedback). This would require instead of looking
> only at results also at the operators used. The pitfall is that in many
> cases a correct solution can be achieved in multiple ways. I came across a
> Flink test library [1] which allows to test Flink code more extensively but
> seems to be only in Java.
>
> In retrospective, I do think using Docker is a good approach as Fabian
> confirms. However, the way we currently assess student solutions might be
> improved. I assume that in your trainings manual feedback is given, but
> unfortunately this is quite difficult for so many students.
>
> Cheers,
> Wouter
>
> 1: https://github.com/ottogroup/flink-spector
>
>
> Op ma 4 mrt. 2019 om 14:39 schreef Fabian Hueske <fhue...@gmail.com>:
>
>> Hi Wouter,
>>
>> We are using Docker Compose (Flink JM, Flink TM, Kafka, Zookeeper) setups
>> for our trainings and it is working very well.
>> We have an additional container that feeds a Kafka topic via the
>> commandline producer to simulate a somewhat realistic behavior.
>> Of course, you can do it without Kafka as and use some kind of data
>> generating source that reads from a file that is replace for evaluation.
>>
>> The biggest benefit that I see with using Docker is that the students
>> have an environment that is close to grading situation for development and
>> testing.
>> You do not need to provide infrastructure but everyone is running it
>> locally in a well-defined context.
>>
>> So, as Joern said, what problems do you see with Docker?
>>
>> Best,
>> Fabian
>>
>> Am Mo., 4. März 2019 um 13:44 Uhr schrieb Jörn Franke <
>> jornfra...@gmail.com>:
>>
>>> It would help to understand the current issues that you have with this
>>> approach? I used a similar approach (not with Flink, but a similar big data
>>> technology) some years ago
>>>
>>> > Am 04.03.2019 um 11:32 schrieb Wouter Zorgdrager <
>>> w.d.zorgdra...@tudelft.nl>:
>>> >
>>> > Hi all,
>>> >
>>> > I'm working on a setup to use Apache Flink in an assignment for a Big
>>> Data (bachelor) university course and I'm interested in your view on this.
>>> To sketch the situation:
>>> > -  > 200 students follow this course
>>> > - students have to write some (simple) Flink applications using the
>>> DataStream API; the focus is on writing the transformation code
>>> > - students need to write Scala code
>>> > - we provide a dataset and a template (Scala class) with function
>>> signatures and detailed description per application.
>>> > e.g.: def assignment_one(input: DataStream[Event]):
>>> DataStream[(String, Int)] = ???
>>> > - we provide some setup code like parsing of data and setting up the
>>> streaming environment
>>> > - assignments need to be auto-graded, based on correct results
>>> >
>>> > In last years course edition we approached this by a custom Docker
>>> container. This container first compiled the students code, run all the
>>> Flink applications against a different dataset and then verified the output
>>> against our solutions. This was turned into a grade and reported back to
>>> the student. Although this was a working approach, I think we can do better.
>>> >
>>> > I'm wondering if any of you have experience with using Apache Flink in
>>> a university course (or have seen this somewhere) as well as assessing
>>> Flink code.
>>> >
>>> > Thanks a lot!
>>> >
>>> > Kind regards,
>>> > Wouter Zorgdrager
>>>
>>

Re: Using Flink in an university course

Reply via email to