Hi all, I noticed that Iceberg's predicates are not compatible with SQL predicates when it comes to handling NULL values. In SQL, if any of the operands of a scalar comparison predicate is NULL, then the resultant truth value of the predicate is UNKNOWN. e.g. `SELECT NULL != 1` will return a NULL in SQL and not FALSE. If such predicates are used as filters, the resultant output will be different for Iceberg v/s SQL. e.g. `.filter(notEqual(column, 'x'))` in Iceberg will return rows excluding ‘x’ but including NULL. The same thing in Presto SQL `WHERE column != 'x'` will return rows excluding both ‘x’ and NULL. So essentially, Iceberg can return more rows than required when an engine pushes down these predicates, however the engines will filter out these additional rows, so everything seems good. But modules like iceberg-data and iceberg-mr which rely solely on Iceberg's expression evaluators for filtering will return the additional rows. Should we change the behavior of Iceberg expressions to be more SQL-like or should we keep this behavior and document the differences when compared with SQL?
This also has some implications on predicate pushdown e.g. ORC follows SQL semantics and if we try to push down Iceberg predicates, simply converting Iceberg's 'NOT EQUAL' to ORC's 'NOT EQUAL' will be insufficient as it does not return NULLs contrary to what Iceberg expects. Thanks, Shardul