lindong28 commented on code in PR #193: URL: https://github.com/apache/flink-ml/pull/193#discussion_r1062426366
########## flink-ml-benchmark/src/test/java/org/apache/flink/ml/benchmark/BenchmarkTest.java: ########## @@ -103,4 +166,32 @@ public void testRunBenchmark() throws Exception { result.outputThroughput, 1e-5); } + + @SuppressWarnings({"unchecked", "rawtypes"}) + private void dryRunBenchmark(Map<String, Map<String, ?>> params) Review Comment: It seems that this method duplicates a lot of existing logic implemented by `BenchmarkUtils#runBenchmark`. Would it be simpler to just add a `boolean isDryRun` parameter to that method? ########## flink-ml-benchmark/src/main/resources/minmaxscaler-benchmark.json: ########## @@ -22,19 +22,15 @@ "vectorDim": 100, "colNames": [ [ - "features" + "inputCol" Review Comment: Can you verify that this configuration still works after the change? Same for other configuration files updated by this PR. ########## flink-ml-benchmark/src/main/java/org/apache/flink/ml/benchmark/datagenerator/clustering/KMeansModelDataGenerator.java: ########## @@ -55,20 +61,36 @@ public Table[] getData(StreamTableEnvironment tEnv) { InputDataGenerator<?> vectorArrayGenerator = new DenseVectorArrayGenerator(); ReadWriteUtils.updateExistingParams(vectorArrayGenerator, paramMap); vectorArrayGenerator.setNumValues(1); + vectorArrayGenerator.setColNames(new String[] {"centroids"}); Review Comment: Why do we need to make this change? This is change necessary to add benchmarks for algorithms targeted by this PR? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org