Flink performs significant scanning during the pre-flight phase of a Flink 
application 
(https://ci.apache.org/projects/flink/flink-docs-stable/dev/types_serialization.html).
 The act of creating sources, operators and sinks causes Flink to scan the data 
types of the objects that are used within the topology of a given streaming 
flow as apparently Flink will try to optimise jobs based on this information.

Is this scanning configurable? Can I turn it off and just force Flink to use 
Kryo serialisation only and not need or use any of this scanned information?

I have a very large, deeply nested class in a proprietary library that was auto 
generated and Flink seems to get into a very large endless loop when scanning 
it that results in out of memory errors after running for several hours (the 
application never actually launches via env.execute(), even if I bump up the 
heap size significantly). The class has a number of circular references, i.e. 
class and its child classes contains references to other classes of the same 
type, is this likely to be a problem?

Many thanks,

John

Reply via email to