Hello,
I'm repeating slack messages here.
I'm experimenting with ingesting JDBC into Paquet. ie repeating
spotify/dbeam with
- JdbcIO.readRows()
- AvroUtils.getAvroSchema(beamRows.getSchema()).
- AvroUtils.schemaCoder(avroSchema)
- AvroUtils.getRowToGenericRecordFunction(avroSchema)
Here's the observed issues:
- DECIMAL(21,2) can't be handled due to loosing scale param (2).
org.apache.avro.Conversions.DecimalConversion.validate()
AvroTypeException("Cannot
encode decimal with scale 2 as scale 0 without rounding")
- it can be fixing Beam Row schema by FieldType.logicalType(
FixedPrecisionNumeric.of(Integer.MAX_VALUE, 2)) and then it should be
passed to AvroSchema as LogicalTypes.decimal(Integer.MAX_VALUE, ((
RowWithStorage)
(field.getType().getLogicalType()).getArgument()).getValue("scale"
)).addToSchema(Schema.create(Schema.Type.BYTES)) (it might not be the
best approach, you know) I noticed
https://github.com/apache/beam/issues/21226
https://github.com/apache/beam/issues/20978 which might be related.
- INT16 represented in beam schema as-is, but its 32-bit INT in avro
and java Short in runtime that causes
ClassCastException: class java.lang.Short cannot be cast to class
java.lang.Integer (java.lang.Short and java.lang.Integer are in module
java.base of loader 'bootstrap')
at
org.apache.beam.sdk.extensions.avro.schemas.utils.AvroUtils.convertAvroFieldStrict(AvroUtils.java:1299)
I suppose this method can accept Number and then call intValue() wdyt?
--
Sincerely yours
Mikhail Khludnev