zhipeng93 commented on code in PR #188:
URL: https://github.com/apache/flink-ml/pull/188#discussion_r1060480557


##########
flink-ml-lib/src/main/java/org/apache/flink/ml/feature/binarizer/Binarizer.java:
##########
@@ -70,11 +71,12 @@ public Table[] transform(Table... inputs) {
 
         for (int i = 0; i < inputCols.length; ++i) {
             int idx = inputTypeInfo.getFieldIndex(inputCols[i]);
-            if (inputTypeInfo.getFieldTypes()[idx] instanceof 
SparseVectorTypeInfo) {
+            Class<?> typeClass = inputTypeInfo.getTypeAt(idx).getTypeClass();
+            if (typeClass.equals(SparseVector.class)) {
                 outputTypes[i] = SparseVectorTypeInfo.INSTANCE;
-            } else if (inputTypeInfo.getFieldTypes()[idx] instanceof 
DenseVectorTypeInfo) {
+            } else if (typeClass.equals(DenseVector.class)) {

Review Comment:
   Thanks for the comment. The benchmark result shows that using 
ExternalTypeInfo is acceptable (Benchmark code [1] ). The detail benchmark 
number are listed as belows:
   
   # For DenseVector
   - Using `ExternalTypeInfo.of()` for dense vector with dimension as 1000:
   ```
   Benchmark                   (numDataPoints)  Mode  Cnt      Score      Error 
 Units
   DenseVectorBench.benchmark            10000  avgt    5   1253.435 ±  278.433 
 ms/op
   DenseVectorBench.benchmark           100000  avgt    5   2798.003 ±  143.622 
 ms/op
   DenseVectorBench.benchmark          1000000  avgt    5  20133.060 ± 5417.113 
 ms/op
   ```
   - Using `TypeInformation.of()` for dense vector with dimension as 1000:
   ```
   Benchmark                   (numDataPoints)  Mode  Cnt      Score      Error 
 Units
   DenseVectorBench.benchmark            10000  avgt    5   1308.907 ±  295.118 
 ms/op
   DenseVectorBench.benchmark           100000  avgt    5   2796.361 ±  262.703 
 ms/op
   DenseVectorBench.benchmark          1000000  avgt    5  19582.051 ± 4156.113 
 ms/op
   ```
   # For Long:
   - Using `ExternalTypeInfo.of()` for Long:
   ```
   Benchmark                       (numDataPoints)  Mode  Cnt      Score      
Error  Units
   DenseVectorBench.benchmarkLong          1000000  avgt    5   1303.187 ±  
194.059  ms/op
   DenseVectorBench.benchmarkLong         10000000  avgt    5   3445.889 ±  
314.656  ms/op
   DenseVectorBench.benchmarkLong        100000000  avgt    5  25728.956 ± 
2897.805  ms/op
   ```
   - Using TypeInformation.of()` for Long:
   ```
   Benchmark                       (numDataPoints)  Mode  Cnt      Score      
Error  Units
   DenseVectorBench.benchmarkLong          1000000  avgt    5   1472.226 ±  
206.726  ms/op
   DenseVectorBench.benchmarkLong         10000000  avgt    5   3714.417 ±  
487.310  ms/op
   DenseVectorBench.benchmarkLong        100000000  avgt    5  25715.854 ± 
2427.714  ms/op
   ```
   
   [1] 
https://github.com/zhipeng93/flink-ml/commit/c0916dcf3f6afe6a6653de5374564544ef4d19b5



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to