Hi Julien,

Your question again reminds me about the things we were facing with Iceberg :)

In Iceberg there is a possibility to define `required` columns, and we thought 
it would be a good idea to convert these columns to `NOT NULL` columns in Hive.
We tried to use the HiveMetaHook API to do this, but we later found that the 
constraints are handled separately from base table metadata manipulation. So 
basically at table creation time the we do not know anything about the 
constraints.

For me the issue you are mentioning seems similar in a sense that constraints 
are not available in the SerDe.

I think it would be good to check the handling of the constraints in the 
different APIs, but if the current version is not good enough this would 
definitely need a version upgrade.

Anyway, if you figure out something, I would be interested in the results too :)

Thanks,
Peter

> On 2022. May 5., at 1:18, Julien Phalip <jpha...@gmail.com> wrote:
> 
> Hi,
> 
> I'm trying to create a table with a NOT NULL column:  CREATE TABLE mytable 
> (int_required BIGINT NOT NULL, ....)
> 
> However, it looks like the schema that AvroSerde generates ignores the "NOT 
> NULL" part and outputs the following avro field schema: 
> {"name":"int_required","type":["null","long"],"default":null}
> 
> Looking at AvroSerde's codebase, it seems that it infers the avro schema 
> based on the "column.types" table property: 
> https://github.com/apache/hive/blob/release-3.1.2-rc0/serde/src/java/org/apache/hadoop/hive/serde2/avro/AvroSerDe.java#L118
>  
> <https://github.com/apache/hive/blob/release-3.1.2-rc0/serde/src/java/org/apache/hadoop/hive/serde2/avro/AvroSerDe.java#L118>
> 
> However, it looks like the "column.types" table property contains no 
> indication of whether the column might be "NOT NULL". The property just 
> contains "bigint" for the column. So the AvroSerde assumes that all columns 
> must accept null values.
> 
> This means that the AvroSerde's inferred schema for that field is a 
> UNION["null", "long"] instead of just LONG.
> 
> Is that the intended behavior, or am I missing something?
> 
> Thanks,
> 
> Julien

Reply via email to