The short answer is that because DataSet is not serializable. I think the main underlying problem is that Flink needs to see all DataSet operations before launching the job. However, if you have a DataSet<DataSet<A>>, then operations on the inner DataSets will end up being specified inside the UDFs of operations on the outer DataSet. This is a problem, because Flink cannot see inside the UDFs before the job starts, since they get executed only after the job starts executing.
There are some workarounds though: 1. If you know that your inner DataSets would be small, then you can instead replace them with some regular Java/Scala collection class, like an Array or List. 2. You can often flatten your data, that is, somehow represent your nested collection with a flat collection. Exactly how to do this depends on your use case. For example, suppose that originally we wanted to represent the lengths of the shortest paths between all pairs of vertices in a graph by a DataSet that for every vertex contains a DataSet that tells us the distances to all the other Vertices: DataSet<Tuple2<Vertex, DataSet<Tuple2<Vertex, Int>>>> This doesn't work because of the nested DataSets, but you could flatten this into the following: DataSet<Tuple3<Vertex, Vertex, Int>> which is a DataSet that contains pairs of vertices and their distances. Btw. [1] is a paper where some graph data structures having complex nesting are represented in Flink. Best, Gábor [1] http://dbs.uni-leipzig.de/file/EPGM.pdf 2016-11-15 17:37 GMT+01:00 otherwise777 <wou...@onzichtbaar.net>: > It seems what i tried did indeed not work. > Can you explain me why that doesn't work though? > > > > -- > View this message in context: > http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/Retrieving-values-from-a-dataset-of-datasets-tp10108p10128.html > Sent from the Apache Flink User Mailing List archive. mailing list archive at > Nabble.com.