Jonathan Vexler created HUDI-9607:
-------------------------------------
Summary: Flink VARBINARY in array and map field index oob read
issue
Key: HUDI-9607
URL: https://issues.apache.org/jira/browse/HUDI-9607
Project: Apache Hudi
Issue Type: Bug
Components: flink, reader-core
Affects Versions: 1.0.2
Reporter: Jonathan Vexler
Fix For: 1.1.0
{code:java}
java.lang.RuntimeException: java.lang.IllegalArgumentException: 72 > 36
at
org.apache.hudi.common.table.read.TestHoodieFileGroupReaderBase.lambda$readRecordsFromFileGroup$9(TestHoodieFileGroupReaderBase.java:698)
at java.util.ArrayList.forEach(ArrayList.java:1259) at
org.apache.hudi.common.table.read.TestHoodieFileGroupReaderBase.readRecordsFromFileGroup(TestHoodieFileGroupReaderBase.java:691)
at
org.apache.hudi.common.table.read.TestHoodieFileGroupReaderBase.validateOutputFromFileGroupReaderWithNativeRecords(TestHoodieFileGroupReaderBase.java:560)
at
org.apache.hudi.common.table.read.TestHoodieFileGroupReaderBase.testSchemaEvolutionWhenBaseFilesWithDifferentSchema(TestHoodieFileGroupReaderBase.java:244)
at java.lang.reflect.Method.invoke(Method.java:498) at
java.util.ArrayList.forEach(ArrayList.java:1259) at
java.util.ArrayList.forEach(ArrayList.java:1259)Caused by:
java.lang.IllegalArgumentException: 72 > 36 at
java.util.Arrays.copyOfRange(Arrays.java:3519) at
org.apache.flink.table.data.columnar.ColumnarArrayData.getBinary(ColumnarArrayData.java:138)
at
org.apache.hudi.table.format.cow.vector.ColumnarGroupRowData.getBinary(ColumnarGroupRowData.java:121)
at
org.apache.flink.table.data.RowData.lambda$createFieldGetter$245ca7d1$3(RowData.java:228)
at
org.apache.flink.table.runtime.typeutils.RowDataSerializer.toBinaryRow(RowDataSerializer.java:207)
at
org.apache.flink.table.data.writer.AbstractBinaryWriter.writeRow(AbstractBinaryWriter.java:147)
at
org.apache.flink.table.data.writer.BinaryArrayWriter.writeRow(BinaryArrayWriter.java:30)
at
org.apache.flink.table.data.writer.BinaryWriter.write(BinaryWriter.java:155)
{code}
Schema of offending field with issue:
{code:java}
{
"type" : "map",
"values" : {
"type" : "record",
"name" : "customMapRecord",
"doc" : "",
"fields" : [ {
"name" : "customFieldMap0",
"type" : "int",
"doc" : ""
}, {
"name" : "customFieldMap1",
"type" : "int",
"doc" : ""
}, {
"name" : "customFieldMap2",
"type" : "int",
"doc" : ""
}, {
"name" : "customFieldMap3",
"type" : "int",
"doc" : ""
}, {
"name" : "customFieldMap4",
"type" : "int",
"doc" : ""
}, {
"name" : "customFieldMap5",
"type" : "long",
"doc" : ""
}, {
"name" : "customFieldMap6",
"type" : "long",
"doc" : ""
}, {
"name" : "customFieldMap7",
"type" : "long",
"doc" : ""
}, {
"name" : "customFieldMap8",
"type" : "long",
"doc" : ""
}, {
"name" : "customFieldMap9",
"type" : "float",
"doc" : ""
}, {
"name" : "customFieldMap10",
"type" : "float",
"doc" : ""
}, {
"name" : "customFieldMap11",
"type" : "float",
"doc" : ""
}, {
"name" : "customFieldMap12",
"type" : "double",
"doc" : ""
}, {
"name" : "customFieldMap13",
"type" : "double",
"doc" : ""
}, {
"name" : "customFieldMap14",
"type" : "string",
"doc" : ""
}, {
"name" : "customFieldMap15",
"type" : "string",
"doc" : ""
}, {
"name" : "customFieldMap16",
"type" : "bytes",
"doc" : ""
}, {
"name" : "customFieldMap17",
"type" : "bytes",
"doc" : ""
} ]
}
} {code}
Schema with flag to prevent byte fields (this schema doesn't cause failure)
{code:java}
{
"type" : "map",
"values" : {
"type" : "record",
"name" : "customMapRecord",
"doc" : "",
"fields" : [ {
"name" : "customFieldMap0",
"type" : "int",
"doc" : ""
}, {
"name" : "customFieldMap1",
"type" : "int",
"doc" : ""
}, {
"name" : "customFieldMap2",
"type" : "int",
"doc" : ""
}, {
"name" : "customFieldMap3",
"type" : "int",
"doc" : ""
}, {
"name" : "customFieldMap4",
"type" : "int",
"doc" : ""
}, {
"name" : "customFieldMap5",
"type" : "long",
"doc" : ""
}, {
"name" : "customFieldMap6",
"type" : "long",
"doc" : ""
}, {
"name" : "customFieldMap7",
"type" : "long",
"doc" : ""
}, {
"name" : "customFieldMap8",
"type" : "long",
"doc" : ""
}, {
"name" : "customFieldMap9",
"type" : "float",
"doc" : ""
}, {
"name" : "customFieldMap10",
"type" : "float",
"doc" : ""
}, {
"name" : "customFieldMap11",
"type" : "float",
"doc" : ""
}, {
"name" : "customFieldMap12",
"type" : "double",
"doc" : ""
}, {
"name" : "customFieldMap13",
"type" : "double",
"doc" : ""
}, {
"name" : "customFieldMap14",
"type" : "string",
"doc" : ""
}, {
"name" : "customFieldMap15",
"type" : "string",
"doc" : ""
}, {
"name" : "customFieldMap16",
"type" : "string",
"doc" : ""
}, {
"name" : "customFieldMap17",
"type" : "string",
"doc" : ""
} ]
}
} {code}
Test flag is `supportBytesInArrayMap` to expose the error. There are also TODOs
to remove code when this is fixed
--
This message was sent by Atlassian Jira
(v8.20.10#820010)