Jiangjie Qin created FLINK-32398: ------------------------------------ Summary: Support Avro SpecificRecord in DataStream and Table conversion. Key: FLINK-32398 URL: https://issues.apache.org/jira/browse/FLINK-32398 Project: Flink Issue Type: New Feature Components: Formats (JSON, Avro, Parquet, ORC, SequenceFile) Affects Versions: 1.17.1 Reporter: Jiangjie Qin
At this point, it seems that Avro SpecificRecord is not supported in DataStream and Table conversion. For example, the following code breaks when MyAvroRecord contains fields of type Record, Enum, Array, etc. {code:java} ing schemaString = MyAvroRecord.getClassSchema().toString(); DataType dataType = AvroSchemaConverter.convertToDataType(schemaString); TypeInformation<MyAvroRecord> typeInfo = AvroSchemaConverter.convertToTypeInfo(schemaString);; input.getTransformation().setOutputType(typeInfo); tEnv.createTemporaryView("myTable", input); Table result = tEnv.sqlQuery("SELECT * FROM myTable"); DataStream<MyAvroRecord> output = tEnv.toDataStream(result, dataType); output.getTransformation().setOutputType(typeInfo); {code} While the conversion from {{MyAvroRecord}} to {{RowData}} seems fine, several issues were there when converting the {{RowData}} back to {{{}MyAvroRecord{}}}, including but not limited to: # {{AvroSchemaConverter.convertToDataType(schema)}} maps Avro Record type to RowType, which loses the class information. # {{AvroSchemaConverter}} maps Enum to StringType, and simply try to cast the string to the Enum. I did not find a way to easily convert the between DataStream and Table for Avro SpecificRecord. Given the popularity of Avro SpecificRecord, we should support this. -- This message was sent by Atlassian Jira (v8.20.10#820010)