lindong28 commented on code in PR #111: URL: https://github.com/apache/flink-ml/pull/111#discussion_r900198528
########## flink-ml-benchmark/src/main/java/org/apache/flink/ml/benchmark/BenchmarkUtils.java: ########## @@ -99,6 +99,7 @@ private static BenchmarkResult runBenchmark( DataGenerator<?> modelDataGenerator) throws Exception { StreamExecutionEnvironment env = TableUtils.getExecutionEnvironment(tEnv); + env.getConfig().enableObjectReuse(); Review Comment: According to [1], `enableObjectReuse` can lead to bugs when the user-code function of an operation is not aware of this behavior. So we can avoid bugs if users take care when writing UDF. If we don't enable object re-use here, users won't be able to reproduce the best possible performance by running our benchmarks, which weakens the purpose of adding these benchmarks. I am not sure the two points mentioned above are valid concerns: - We should be able to make sure that all Flink ML algorithms can run with object-reuse enabled. And it is useful to catch and fix issues by running benchmarks with this config. - User's production code typically won't use BenchmarkUtils. Thus their production code won't be directly affected by the `enableObjectReuse()` added here. - While Flink users can change configuration parameters, it is still generally preferred to have our benchmark configuration follow best practices so that users can simply learn our benchmark code to get the best possible performance. [1] https://nightlies.apache.org/flink/flink-docs-master/docs/dev/datastream/execution/execution_configuration/ -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org