[GitHub] [flink-ml] lindong28 commented on a diff in pull request #193: [FLINK-30515] Add benchmark for CountVectorizer, Imputer, RobustScale…

GitBox Thu, 05 Jan 2023 04:41:12 -0800


lindong28 commented on code in PR #193:
URL: https://github.com/apache/flink-ml/pull/193#discussion_r1062426366



##########
flink-ml-benchmark/src/test/java/org/apache/flink/ml/benchmark/BenchmarkTest.java:
##########
@@ -103,4 +166,32 @@ public void testRunBenchmark() throws Exception {
                 result.outputThroughput,
                 1e-5);
     }
+
+    @SuppressWarnings({"unchecked", "rawtypes"})
+    private void dryRunBenchmark(Map<String, Map<String, ?>> params)

Review Comment:
   It seems that this method duplicates a lot of existing logic implemented by 
`BenchmarkUtils#runBenchmark`. Would it be simpler to just add a `boolean 
isDryRun` parameter to that method?



##########
flink-ml-benchmark/src/main/resources/minmaxscaler-benchmark.json:
##########
@@ -22,19 +22,15 @@
         "vectorDim": 100,
         "colNames": [
           [
-            "features"
+            "inputCol"

Review Comment:
   Can you verify that this configuration still works after the change?
   
   Same for other configuration files updated by this PR.



##########
flink-ml-benchmark/src/main/java/org/apache/flink/ml/benchmark/datagenerator/clustering/KMeansModelDataGenerator.java:
##########
@@ -55,20 +61,36 @@ public Table[] getData(StreamTableEnvironment tEnv) {
         InputDataGenerator<?> vectorArrayGenerator = new 
DenseVectorArrayGenerator();
         ReadWriteUtils.updateExistingParams(vectorArrayGenerator, paramMap);
         vectorArrayGenerator.setNumValues(1);
+        vectorArrayGenerator.setColNames(new String[] {"centroids"});

Review Comment:
   Why do we need to make this change? This is change necessary to add 
benchmarks for algorithms targeted by this PR?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[GitHub] [flink-ml] lindong28 commented on a diff in pull request #193: [FLINK-30515] Add benchmark for CountVectorizer, Imputer, RobustScale…

Reply via email to