[ https://issues.apache.org/jira/browse/FLINK-2631?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14733825#comment-14733825 ]
ASF GitHub Bot commented on FLINK-2631: --------------------------------------- GitHub user tillrohrmann opened a pull request: https://github.com/apache/flink/pull/1101 [FLINK-2631] [streaming] Fixes the StreamFold operator and adds output type configurable stream operators Adds support for non-serializable initial fold values by storing the value in a byte array before shipping it. The shipped initial fold value is deserialized on the TM while calling the `open` method. Furthermore, this PR introduces the `OutputTypeConfigurable` interface which allows stream operators to get to know their output type. The `OutputTypeConfigurable` interface offers the method `setOutputType` which is called by the `StreamGraph` when the `StreamOperator` is added in the `addOperator` method. At the latest at this moment, the concrete output type, whether inferred from the UDF or set manually with `returns`, should be know to the system, because also the input and output type serializers for the vertex are created in the `addOperator` method. All stream operators which need to know their output type should implement the `OutputTypeConfigurable` interface. You can merge this pull request into a Git repository by running: $ git pull https://github.com/tillrohrmann/flink fixStreamingFold Alternatively you can review and apply these changes as the patch at: https://github.com/apache/flink/pull/1101.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #1101 ---- commit 63951adca0e8bfefd1d81b933017e9fadc5f556f Author: Till Rohrmann <trohrm...@apache.org> Date: 2015-09-07T09:34:48Z [FLINK-2631] [streaming] Fixes the StreamFold operator. Adds OutputTypeConfigurable interface to support type injection at StreamGraph creation. Adds test for non serializable fold type. Adds test to verify proper output type forwarding for OutputTypeConfigurable implementations. Adds comments ---- > StreamFold operator does not respect returns type and stores non serializable > values > ------------------------------------------------------------------------------------ > > Key: FLINK-2631 > URL: https://issues.apache.org/jira/browse/FLINK-2631 > Project: Flink > Issue Type: Bug > Reporter: Till Rohrmann > > The {{StreamFold}} operator stores the initial value of the fold operation > for the task deployment. This value does not necessarily have to be > serializable. Thus, using the fold operation with a non-serializable initial > value will fail the job. > Moreover, the {{StreamFold}} operator needs to know the output type in order > to create a {{TypeSerializer}}. For {{StreamGraphs}} where the output type is > not know when the operator is created, as it is the case for the Scala > DataStream API which directly sets the output type after creating the > operator via the {{returns}} method, this approach will fail. The reason is > that the {{StreamFold}} operator does receive the type information set by the > {{returns}} method. Therefore, the job will fail at runtime because the > operator tries to create a serializer from a {{MissingTypeInformation}}. -- This message was sent by Atlassian JIRA (v6.3.4#6332)