[GitHub] [flink-ml] lindong28 commented on a diff in pull request #111: [FLINK-27877] Improve StringIndexer performance and optimize flink-ml-bench

GitBox Fri, 17 Jun 2022 07:55:38 -0700


lindong28 commented on code in PR #111:
URL: https://github.com/apache/flink-ml/pull/111#discussion_r900198528



##########
flink-ml-benchmark/src/main/java/org/apache/flink/ml/benchmark/BenchmarkUtils.java:
##########
@@ -99,6 +99,7 @@ private static BenchmarkResult runBenchmark(
             DataGenerator<?> modelDataGenerator)
             throws Exception {
         StreamExecutionEnvironment env = 
TableUtils.getExecutionEnvironment(tEnv);
+        env.getConfig().enableObjectReuse();

Review Comment:
   According to [1], `enableObjectReuse` can lead to bugs when the user-code 
function of an operation is not aware of this behavior. So we can avoid bugs if 
users take care when writing UDF.
   
   If we don't enable object re-use here, users won't be able to reproduce the 
best possible performance by running our benchmarks, which weakens the purpose 
of adding these benchmarks.
   
   I am not sure the two points mentioned above are valid concerns:
   - We should be able to make sure that all Flink ML algorithms can run with 
object-reuse enabled. And it is useful to catch and fix issues by running 
benchmarks with this config.
   - User's production code typically won't use BenchmarkUtils. Thus their 
production code won't be directly affected by the `enableObjectReuse()` added 
here.
   - While Flink users can change configuration parameters, it is still 
generally preferred to have our benchmark configuration follow best practices 
so that users can simply learn our benchmark code to get the best possible 
performance.
   
   [1] 
https://nightlies.apache.org/flink/flink-docs-master/docs/dev/datastream/execution/execution_configuration/



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[GitHub] [flink-ml] lindong28 commented on a diff in pull request #111: [FLINK-27877] Improve StringIndexer performance and optimize flink-ml-bench

Reply via email to