For TINYINT and SMALLINT, I don't think there is any advantage at the
storage layer. Avro uses variable-length ints and the columnar formats,
Parquet and ORC, will do efficient encodings for multiple values in a
column. I don't see much value in these types, besides compatibility with
existing SQL.
Just to update this thread, we have agreed internally to use INT in the
struct schema corresponding to union types. The reasons are two-fold:
(1) Uncertainty around whether TINYINT will make it to Iceberg while we
wanted to stick to the spec.
(2) Since Avro does not support TINYINT either, this iss
Also, wanted to add another observation that is on the flip side of the
initial argument. Some data formats like Avro do not support TINYINT
either. So even if Trino uses TINYINT in the struct schema, when the table
is written back to Avro, TINYINT will not be used. This supports the side
of the ar