Hi,

I have some questions on the current Iceberg spec regarding equality
deletes:
https://iceberg.apache.org/spec/#equality-delete-files
The spec says that for "a table with the following data:

 <https://iceberg.apache.org/spec/#__codelineno-1-1> 1: id | 2:
category | 3: name
<https://iceberg.apache.org/spec/#__codelineno-1-2>-------|-------------|---------
<https://iceberg.apache.org/spec/#__codelineno-1-3> 1     | marsupial
 | Koala <https://iceberg.apache.org/spec/#__codelineno-1-4> 2     |
toy         | Teddy
<https://iceberg.apache.org/spec/#__codelineno-1-5> 3     | NULL
 | Grizzly <https://iceberg.apache.org/spec/#__codelineno-1-6> 4     |
NULL        | Polar

The delete id = 3 could be written as either of the following equality
delete files:

 <https://iceberg.apache.org/spec/#__codelineno-2-1>equality_ids=[1]
<https://iceberg.apache.org/spec/#__codelineno-2-2>
<https://iceberg.apache.org/spec/#__codelineno-2-3> 1: id
<https://iceberg.apache.org/spec/#__codelineno-2-4>-------
<https://iceberg.apache.org/spec/#__codelineno-2-5> 3

equality_ids=[1] <https://iceberg.apache.org/spec/#__codelineno-3-2>
<https://iceberg.apache.org/spec/#__codelineno-3-3> 1: id | 2:
category | 3: name
<https://iceberg.apache.org/spec/#__codelineno-3-4>-------|-------------|---------
<https://iceberg.apache.org/spec/#__codelineno-3-5> 3     | NULL
 | Grizzly

"

1. Are the options either (a) write only the column(s) listed in
equality_ids or (b) write all the columns? i.e, no in between.
2. If we write all the columns, are only columns listed in equality_ids
considered? What happens if a non-equality_id column does not match? e.g.,

equality_ids=[1] <https://iceberg.apache.org/spec/#__codelineno-3-2>
<https://iceberg.apache.org/spec/#__codelineno-3-3> 1: id | 2: category |
3: name 
<https://iceberg.apache.org/spec/#__codelineno-3-4>-------|-------------|---------
<https://iceberg.apache.org/spec/#__codelineno-3-5> 3 | NULL | Polar

Is that (a) invalid, or does that (b) still result in deleting id = 3, or
(c) result in deleting no rows?

The spec says "Each row of the delete file produces one equality predicate
that matches any row where the delete columns are equal. Multiple columns
can be thought of as an AND of equality predicates." That could be
interpreted to mean (c).

Thanks,
Wing Yew

Reply via email to