[ https://issues.apache.org/jira/browse/FLINK-12113?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16812967#comment-16812967 ]
yankai zhang commented on FLINK-12113: -------------------------------------- Yes, _fromCollection(Iterator, Class)_ works well as expected without anonymous class. Problem here is anonymous class object in instance method implicitly references outer _this_(but not actually used), while outer _this_ is not serializable, and this is exactly what _StreamExecutionEnvironment#clean_ supposed to do. In act, the iterator passed by user is wrapped within a _FromIteratorFunction_, and then _StreamExecutionEnvironment#clean_ is called on that wrapper __ instance, not the iterator itself. However current implementation of _StreamExecutionEnvironment#clean_ is not recursive, it can't find and clean _this_ deeply nested in closure. Here is my fully reproducible code: {code:java} public class MainTest { interface IS<E> extends Iterator<E>, Serializable { } @Test public void cleanTest() { StreamExecutionEnvironment env = StreamExecutionEnvironment.getExecutionEnvironment(); env.fromCollection(new IS<Object>() { @Override public boolean hasNext() { return false; } @Override public Object next() { return null; } }, Object.class); } }{code} > User code passing to fromCollection(Iterator, Class) not cleaned > ---------------------------------------------------------------- > > Key: FLINK-12113 > URL: https://issues.apache.org/jira/browse/FLINK-12113 > Project: Flink > Issue Type: Bug > Components: API / DataStream > Affects Versions: 1.7.2 > Reporter: yankai zhang > Priority: Major > Attachments: image-2019-04-07-21-52-37-264.png, > image-2019-04-08-23-19-27-359.png > > > > {code:java} > interface IS<E> extends Iterator<E>, Serializable { } > StreamExecutionEnvironment env = > StreamExecutionEnvironment.getExecutionEnvironment(); > env.fromCollection(new IS<Object>() { > @Override > public boolean hasNext() { > return false; > } > @Override > public Object next() { > return null; > } > }, Object.class); > {code} > Code piece above throws exception: > {code:java} > org.apache.flink.api.common.InvalidProgramException: The implementation of > the SourceFunction is not serializable. The object probably contains or > references non serializable fields. > at org.apache.flink.api.java.ClosureCleaner.clean(ClosureCleaner.java:99) > ....{code} > And my workaround is wrapping clean around iterator instance, like this: > > {code:java} > StreamExecutionEnvironment env = > StreamExecutionEnvironment.getExecutionEnvironment(); > env.fromCollection(env.clean(new IS<Object>() { > @Override > public boolean hasNext() { > return false; > } > @Override > public Object next() { > return null; > } > }), Object.class); > {code} > > > -- This message was sent by Atlassian JIRA (v7.6.3#76005)