Re: Using Flink in an university course

Wouter Zorgdrager Wed, 06 Mar 2019 05:43:16 -0800

Hi all,

Thanks for the input. Much appreciated.


Regards,
Wouter

Op ma 4 mrt. 2019 om 20:40 schreef Addison Higham <addis...@gmail.com>:

> Hi there,
>
> As far as a runtime for students, it seems like docker is your best bet.
> However, you could have them instead package a jar using some interface
> (for example, see
> https://ci.apache.org/projects/flink/flink-docs-release-1.7/dev/packaging.html,
> which details the `Program` interface) and then execute it inside a custom
> runner. That *might* result in something less prone to breakage as it would
> need to conform to an interface, but it may require a fair amount of custom
> code to reduce the boiler plate to build up a program plan as well as the
> custom runner. The code for how flink loads a jar and turns it into
> something it can execute is mostly encapsulated
> in org.apache.flink.client.program.PackagedProgram, which might be a good
> thing to read and understand if you go down this route.
>
> If you want to give more insight, you could build some tooling to traverse
> the underlying graphs that the students build up in their data stream
> application. For example, calling
> `StreamExecutionEnvironment.getStreamGraph` after the data stream is built
> will get a graph of the current job, which you can then use to traverse a
> graph and see which operators and edges are in use. This is very similar to
> the process flink uses to build the job DAG it renders in the UI. I am not
> sure what you could do as an automated analysis, but the StreamGraph API is
> quite low level and exposes a lot of information about the program.
>
> Hopefully that is a little bit helpful. Good luck and sounds like a fun
> course!
>
>
> On Mon, Mar 4, 2019 at 7:16 AM Wouter Zorgdrager <
> w.d.zorgdra...@tudelft.nl> wrote:
>
>> Hey all,
>>
>> Thanks for the replies. The issues we were running into (which are not
>> specific to Docker):
>> - Students changing the template wrongly failed the container.
>> - We give full points if the output matches our solutions (and none
>> otherwise), but it would be nice if we could give partial grades per
>> assignment (and better feedback). This would require instead of looking
>> only at results also at the operators used. The pitfall is that in many
>> cases a correct solution can be achieved in multiple ways. I came across a
>> Flink test library [1] which allows to test Flink code more extensively but
>> seems to be only in Java.
>>
>> In retrospective, I do think using Docker is a good approach as Fabian
>> confirms. However, the way we currently assess student solutions might be
>> improved. I assume that in your trainings manual feedback is given, but
>> unfortunately this is quite difficult for so many students.
>>
>> Cheers,
>> Wouter
>>
>> 1: https://github.com/ottogroup/flink-spector
>>
>>
>> Op ma 4 mrt. 2019 om 14:39 schreef Fabian Hueske <fhue...@gmail.com>:
>>
>>> Hi Wouter,
>>>
>>> We are using Docker Compose (Flink JM, Flink TM, Kafka, Zookeeper)
>>> setups for our trainings and it is working very well.
>>> We have an additional container that feeds a Kafka topic via the
>>> commandline producer to simulate a somewhat realistic behavior.
>>> Of course, you can do it without Kafka as and use some kind of data
>>> generating source that reads from a file that is replace for evaluation.
>>>
>>> The biggest benefit that I see with using Docker is that the students
>>> have an environment that is close to grading situation for development and
>>> testing.
>>> You do not need to provide infrastructure but everyone is running it
>>> locally in a well-defined context.
>>>
>>> So, as Joern said, what problems do you see with Docker?
>>>
>>> Best,
>>> Fabian
>>>
>>> Am Mo., 4. März 2019 um 13:44 Uhr schrieb Jörn Franke <
>>> jornfra...@gmail.com>:
>>>
>>>> It would help to understand the current issues that you have with this
>>>> approach? I used a similar approach (not with Flink, but a similar big data
>>>> technology) some years ago
>>>>
>>>> > Am 04.03.2019 um 11:32 schrieb Wouter Zorgdrager <
>>>> w.d.zorgdra...@tudelft.nl>:
>>>> >
>>>> > Hi all,
>>>> >
>>>> > I'm working on a setup to use Apache Flink in an assignment for a Big
>>>> Data (bachelor) university course and I'm interested in your view on this.
>>>> To sketch the situation:
>>>> > -  > 200 students follow this course
>>>> > - students have to write some (simple) Flink applications using the
>>>> DataStream API; the focus is on writing the transformation code
>>>> > - students need to write Scala code
>>>> > - we provide a dataset and a template (Scala class) with function
>>>> signatures and detailed description per application.
>>>> > e.g.: def assignment_one(input: DataStream[Event]):
>>>> DataStream[(String, Int)] = ???
>>>> > - we provide some setup code like parsing of data and setting up the
>>>> streaming environment
>>>> > - assignments need to be auto-graded, based on correct results
>>>> >
>>>> > In last years course edition we approached this by a custom Docker
>>>> container. This container first compiled the students code, run all the
>>>> Flink applications against a different dataset and then verified the output
>>>> against our solutions. This was turned into a grade and reported back to
>>>> the student. Although this was a working approach, I think we can do 
>>>> better.
>>>> >
>>>> > I'm wondering if any of you have experience with using Apache Flink
>>>> in a university course (or have seen this somewhere) as well as assessing
>>>> Flink code.
>>>> >
>>>> > Thanks a lot!
>>>> >
>>>> > Kind regards,
>>>> > Wouter Zorgdrager
>>>>
>>>

Re: Using Flink in an university course

Reply via email to