Hi Marton, I think this hasn't come up before because Iceberg will reject a Spark operation that attempts to write an optional field to a required column. I think we added a way around that, but then we are assuming that the file formats catch it and clearly ORC doesn't. And the strategy of validating the write doesn't work for Hive because it considers all fields optional. I think the solution here is to update the ORC writer to enforce required fields.
It would also be nice to catch these exceptions and standardize them with an Iceberg exception that explains what happened and has the field name. But that's not strictly necessary for correctness. Ryan On Fri, Jun 11, 2021 at 6:25 AM Marton Bod <m...@cloudera.com.invalid> wrote: > Hi Team, > > We've recently started testing the behaviour of required vs optional > fields, by creating a table with a required column and trying to insert a > NULL value into it. We've found differing behaviour for each file format: > > - ORC: it allows inserting NULLs into required fields and allows to > read it back as well > - AVRO: it fails at write-time, with a seemingly correct error > message: "java.lang.IllegalArgumentException: Cannot write null to > required string column" > - PARQUET: it fails at write-time too, but with an NPE: > java.lang.NullPointerException at > > org.apache.iceberg.parquet.ParquetValueWriters$StringWriter.write(ParquetValueWriters.java:326) > > Is this a known issue or did we perhaps do something incorrectly in our > tests? > Thanks a lot for any guidance. > > Best, > > Marton > -- Ryan Blue