Hi all, Thanks for the input. Much appreciated.
Regards, Wouter Op ma 4 mrt. 2019 om 20:40 schreef Addison Higham <addis...@gmail.com>: > Hi there, > > As far as a runtime for students, it seems like docker is your best bet. > However, you could have them instead package a jar using some interface > (for example, see > https://ci.apache.org/projects/flink/flink-docs-release-1.7/dev/packaging.html, > which details the `Program` interface) and then execute it inside a custom > runner. That *might* result in something less prone to breakage as it would > need to conform to an interface, but it may require a fair amount of custom > code to reduce the boiler plate to build up a program plan as well as the > custom runner. The code for how flink loads a jar and turns it into > something it can execute is mostly encapsulated > in org.apache.flink.client.program.PackagedProgram, which might be a good > thing to read and understand if you go down this route. > > If you want to give more insight, you could build some tooling to traverse > the underlying graphs that the students build up in their data stream > application. For example, calling > `StreamExecutionEnvironment.getStreamGraph` after the data stream is built > will get a graph of the current job, which you can then use to traverse a > graph and see which operators and edges are in use. This is very similar to > the process flink uses to build the job DAG it renders in the UI. I am not > sure what you could do as an automated analysis, but the StreamGraph API is > quite low level and exposes a lot of information about the program. > > Hopefully that is a little bit helpful. Good luck and sounds like a fun > course! > > > On Mon, Mar 4, 2019 at 7:16 AM Wouter Zorgdrager < > w.d.zorgdra...@tudelft.nl> wrote: > >> Hey all, >> >> Thanks for the replies. The issues we were running into (which are not >> specific to Docker): >> - Students changing the template wrongly failed the container. >> - We give full points if the output matches our solutions (and none >> otherwise), but it would be nice if we could give partial grades per >> assignment (and better feedback). This would require instead of looking >> only at results also at the operators used. The pitfall is that in many >> cases a correct solution can be achieved in multiple ways. I came across a >> Flink test library [1] which allows to test Flink code more extensively but >> seems to be only in Java. >> >> In retrospective, I do think using Docker is a good approach as Fabian >> confirms. However, the way we currently assess student solutions might be >> improved. I assume that in your trainings manual feedback is given, but >> unfortunately this is quite difficult for so many students. >> >> Cheers, >> Wouter >> >> 1: https://github.com/ottogroup/flink-spector >> >> >> Op ma 4 mrt. 2019 om 14:39 schreef Fabian Hueske <fhue...@gmail.com>: >> >>> Hi Wouter, >>> >>> We are using Docker Compose (Flink JM, Flink TM, Kafka, Zookeeper) >>> setups for our trainings and it is working very well. >>> We have an additional container that feeds a Kafka topic via the >>> commandline producer to simulate a somewhat realistic behavior. >>> Of course, you can do it without Kafka as and use some kind of data >>> generating source that reads from a file that is replace for evaluation. >>> >>> The biggest benefit that I see with using Docker is that the students >>> have an environment that is close to grading situation for development and >>> testing. >>> You do not need to provide infrastructure but everyone is running it >>> locally in a well-defined context. >>> >>> So, as Joern said, what problems do you see with Docker? >>> >>> Best, >>> Fabian >>> >>> Am Mo., 4. März 2019 um 13:44 Uhr schrieb Jörn Franke < >>> jornfra...@gmail.com>: >>> >>>> It would help to understand the current issues that you have with this >>>> approach? I used a similar approach (not with Flink, but a similar big data >>>> technology) some years ago >>>> >>>> > Am 04.03.2019 um 11:32 schrieb Wouter Zorgdrager < >>>> w.d.zorgdra...@tudelft.nl>: >>>> > >>>> > Hi all, >>>> > >>>> > I'm working on a setup to use Apache Flink in an assignment for a Big >>>> Data (bachelor) university course and I'm interested in your view on this. >>>> To sketch the situation: >>>> > - > 200 students follow this course >>>> > - students have to write some (simple) Flink applications using the >>>> DataStream API; the focus is on writing the transformation code >>>> > - students need to write Scala code >>>> > - we provide a dataset and a template (Scala class) with function >>>> signatures and detailed description per application. >>>> > e.g.: def assignment_one(input: DataStream[Event]): >>>> DataStream[(String, Int)] = ??? >>>> > - we provide some setup code like parsing of data and setting up the >>>> streaming environment >>>> > - assignments need to be auto-graded, based on correct results >>>> > >>>> > In last years course edition we approached this by a custom Docker >>>> container. This container first compiled the students code, run all the >>>> Flink applications against a different dataset and then verified the output >>>> against our solutions. This was turned into a grade and reported back to >>>> the student. Although this was a working approach, I think we can do >>>> better. >>>> > >>>> > I'm wondering if any of you have experience with using Apache Flink >>>> in a university course (or have seen this somewhere) as well as assessing >>>> Flink code. >>>> > >>>> > Thanks a lot! >>>> > >>>> > Kind regards, >>>> > Wouter Zorgdrager >>>> >>>