[GitHub] [flink] luoyuxia commented on a diff in pull request #20988: [FLINK-29547][table] Fix the bug of Select a[1] which is array type for parquet complex type throw ClassCastException

GitBox Wed, 19 Oct 2022 02:48:29 -0700


luoyuxia commented on code in PR #20988:
URL: https://github.com/apache/flink/pull/20988#discussion_r999205523



##########
flink-table/flink-table-runtime/src/main/java/org/apache/flink/table/data/conversion/ArrayObjectArrayConverter.java:
##########
@@ -100,7 +100,6 @@ public E[] toExternal(ArrayData internal) {
             if (genericArray.isPrimitiveArray()) {
                 return genericToJavaArrayConverter.convert((GenericArrayData) 
internal);
             }
-            return (E[]) genericArray.toObjectArray();

Review Comment:
   I'm wondering whether it's a good idea to always remove this line and then 
fall back  to `toJavaArray(internal)`. 
   From my side, `(E[]) genericArray.toObjectArray()` seems a optimization 
compared to `toJavaArray(internal)`.
   In the vectorized way,  it should fall to `toJavaArray(internal)`, 
otherwise, it will fail.
   But in the non-vectorized way, everything is ok even though we don't remove 
this line.
   For example, if we try to make array read to be non-vectorized by modifying 
the method `isVectorizationUnsupported`. The test will pass if we don't remove 
this line.
   



##########
flink-connectors/flink-connector-hive/src/test/java/org/apache/flink/connectors/hive/HiveTableSourceITCase.java:
##########
@@ -190,6 +190,29 @@ public void testReadParquetComplexDataType() throws 
Exception {
         batchTableEnv.unloadModule("hive");
     }
 
+    @Test
+    public void testReadParquetArrayDataType() throws Exception {
+        batchTableEnv.executeSql(
+                "create table parquet_complex_type_test("
+                        + "a array<int>, m map<int,string>, s 
struct<f1:int,f2:bigint>) stored as parquet");
+        // load hive module so that we can use array,map, named_struct function
+        // for convenient writing complex data
+        batchTableEnv.loadModule("hive", new HiveModule());
+        batchTableEnv.useModules("hive", CoreModuleFactory.IDENTIFIER);
+
+        batchTableEnv
+                .executeSql(
+                        "insert into parquet_complex_type_test"
+                                + " select array(1, 2), map(1, 'val1', 2, 
'val2'),"
+                                + " named_struct('f1', 1,  'f2', 2)")
+                .await();
+
+        Table src = batchTableEnv.sqlQuery("select a[1], a[3] from 
parquet_complex_type_test");

Review Comment:
   I think we can just move the test to `testReadParquetComplexDataType`.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[GitHub] [flink] luoyuxia commented on a diff in pull request #20988: [FLINK-29547][table] Fix the bug of Select a[1] which is array type for parquet complex type throw ClassCastException

Reply via email to