zhipeng93 commented on code in PR #188:
URL: https://github.com/apache/flink-ml/pull/188#discussion_r1060480557
##########
flink-ml-lib/src/main/java/org/apache/flink/ml/feature/binarizer/Binarizer.java:
##########
@@ -70,11 +71,12 @@ public Table[] transform(Table... inputs) {
for (int i = 0; i < inputCols.length; ++i) {
int idx = inputTypeInfo.getFieldIndex(inputCols[i]);
- if (inputTypeInfo.getFieldTypes()[idx] instanceof
SparseVectorTypeInfo) {
+ Class<?> typeClass = inputTypeInfo.getTypeAt(idx).getTypeClass();
+ if (typeClass.equals(SparseVector.class)) {
outputTypes[i] = SparseVectorTypeInfo.INSTANCE;
- } else if (inputTypeInfo.getFieldTypes()[idx] instanceof
DenseVectorTypeInfo) {
+ } else if (typeClass.equals(DenseVector.class)) {
Review Comment:
Thanks for the comment. The benchmark result shows that using
ExternalTypeInfo is acceptable (Benchmark code [1] ). The detail benchmark
number are listed as belows:
# For DenseVector
- Using `ExternalTypeInfo.of()` for dense vector with dimension as 1000:
```
Benchmark (numDataPoints) Mode Cnt Score Error
Units
DenseVectorBench.benchmark 10000 avgt 5 1253.435 ± 278.433
ms/op
DenseVectorBench.benchmark 100000 avgt 5 2798.003 ± 143.622
ms/op
DenseVectorBench.benchmark 1000000 avgt 5 20133.060 ± 5417.113
ms/op
```
- Using `TypeInformation.of()` for dense vector with dimension as 1000:
```
Benchmark (numDataPoints) Mode Cnt Score Error
Units
DenseVectorBench.benchmark 10000 avgt 5 1308.907 ± 295.118
ms/op
DenseVectorBench.benchmark 100000 avgt 5 2796.361 ± 262.703
ms/op
DenseVectorBench.benchmark 1000000 avgt 5 19582.051 ± 4156.113
ms/op
```
# For Long:
- Using `ExternalTypeInfo.of()` for Long:
```
Benchmark (numDataPoints) Mode Cnt Score
Error Units
DenseVectorBench.benchmarkLong 1000000 avgt 5 1303.187 ±
194.059 ms/op
DenseVectorBench.benchmarkLong 10000000 avgt 5 3445.889 ±
314.656 ms/op
DenseVectorBench.benchmarkLong 100000000 avgt 5 25728.956 ±
2897.805 ms/op
```
- Using TypeInformation.of()` for Long:
```
Benchmark (numDataPoints) Mode Cnt Score
Error Units
DenseVectorBench.benchmarkLong 1000000 avgt 5 1472.226 ±
206.726 ms/op
DenseVectorBench.benchmarkLong 10000000 avgt 5 3714.417 ±
487.310 ms/op
DenseVectorBench.benchmarkLong 100000000 avgt 5 25715.854 ±
2427.714 ms/op
```
[1]
https://github.com/zhipeng93/flink-ml/commit/c0916dcf3f6afe6a6653de5374564544ef4d19b5
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]