Thanks Vladimir! Would you like to open a PR to make that change? It sounds
like another good item to put into the "Implementation notes" section.

On Sun, Oct 20, 2024 at 11:41 PM Vladimir Ozerov <voze...@querifylabs.com>
wrote:

> Hi Jean-Baptiste,
>
> Agreed. REST spec looks good. I am talking about the general spec, where
> it might be useful to add a hint to engine developers, that CREATE OR
> REPLACE semantics in Iceberg is expected to follow slightly different
> semantics. This is already broken in Trino: depending on catalog type users
> may get either classical "DROP + CREATE" (for non-REST catalogs), or
> "CREATE AND UPDATE" for REST catalog. For Flink, their official docs say
> that CREATE OR REPLACE == DROP + CREATE, while for Iceberg tables this
> should not be the case. These are definitively things that should be fixed
> at engine levels. But at the same time it highlights that engine developers
> are having hard time defining proper semantics for CREATE OR REPLACE in the
> Iceberg integrations, so a paragraph or so in the main Iceberg spec may
> help us align expectations.
>
> Regards,
> Vladimir.
>
> On Mon, Oct 21, 2024 at 8:28 AM Jean-Baptiste Onofré <j...@nanthrax.net>
> wrote:
>
>> Hi Vladimir,
>>
>> As Ryan said, it's not a bug: CREATE OR REPLACE can be seen as "CREATE
>> AND UPDATE" from table format perspective. Specifically for the
>> properties, it makes sense to not delete the current properties as it
>> can be used in several use cases (security, tables grouping, ...).
>> I'm not sure a REST Spec update is required, probably more on the
>> engine side. In the REST Spec, you can create a table
>> (
>> https://github.com/apache/iceberg/blob/main/open-api/rest-catalog-open-api.yaml#L553
>> )
>> and update a table
>> (
>> https://github.com/apache/iceberg/blob/main/open-api/rest-catalog-open-api.yaml#L975
>> ),
>> and it's up to the query engine to implement the "CREATE OR REPLACE"
>> with the correct semantic.
>>
>> Regards
>> JB
>>
>> On Sun, Oct 20, 2024 at 9:26 PM Vladimir Ozerov <voze...@querifylabs.com>
>> wrote:
>> >
>> > Hi Ryan,
>> >
>> > Thanks for the clarification. Yes, I think my confusion was caused by
>> the fact that many engines treat CREATE OR REPLACE as a semantic equivalent
>> of DROP + CREATE, which is performed atomically (e.g., Flink [1]). Table
>> formats add history on top of that, which is expected to be retained, no
>> questions here. Permission propagation also make sense. For properties
>> things become a bit blurry, because on the one hand there are Iceberg
>> specific properties, which may affect table maintenance, and on the other
>> hand there are user-defined properties in the same bag. The question
>> appeared in the first place because I observed a discrepancy in Trino: all
>> catalogs except for REST completely overrides table properties on REPLACE,
>> and REST catalog merges them, which might be confusing to end users.
>> Perhaps some clarification at the spec level might be useful, because
>> without agreement between engines the could be some subtle bugs in
>> multi-engine environments, such as sudden data format changes between
>> replaces, etc.
>> >
>> > [1]
>> https://nightlies.apache.org/flink/flink-docs-master/docs/dev/table/sql/create/#create-or-replace-table
>> >
>> > Regards,
>> > Vladimir.
>> >
>> > On Sun, Oct 20, 2024 at 9:20 PM rdb...@gmail.com <rdb...@gmail.com>
>> wrote:
>> >>
>> >> Hi Vladimir,
>> >>
>> >> This isn't a bug. The behavior of CREATE OR REPLACE is to replace the
>> data of a table, but to maintain things like other refs, snapshot history,
>> permissions (if supported by the catalog), and table properties. Table
>> properties are replaced if they are set in the operation like `b` in your
>> example. This is not the same as a drop and create, which may be what you
>> want instead.
>> >>
>> >> The reason for this behavior is that the CREATE OR REPLACE operation
>> is used to replace a table's data without needing to handle schema changes
>> between versions. For example, producing a daily report table that replaces
>> the previous day. However, the table still exists and it is valuable to be
>> able to time travel to older versions or to be able to use branches and
>> tags. Clearly, that means that table history and refs stick around so the
>> table is not completely new every time it is replaced.
>> >>
>> >> Adding on to that, properties control things like ref and snapshot
>> retention, file format, compression, and other settings. These aren't
>> settings that need to be carried through in every replace operation. And it
>> would make no sense if you set the snapshot retention because older
>> snapshots are retained, only to have it discarded the next time you replace
>> the table data. A good way to think about this is that table properties are
>> set infrequently, while table data changes regularly. And the person
>> changing the data may not be the person tuning the table settings.
>> >>
>> >> Hopefully that helps,
>> >>
>> >> Ryan
>> >>
>> >> On Sun, Oct 20, 2024 at 9:45 AM Vladimir Ozerov <
>> voze...@querifylabs.com> wrote:
>> >>>
>> >>> Hi,
>> >>>
>> >>> Consider a REST catalog and a user calls "CREATE OR REPLACE <table>"
>> command. When processing the command, engines will usually initiate a
>> "createOrReplace" transaction and add metadata, such as the properties of a
>> new table.
>> >>>
>> >>> Users expect a table to be replaced with a new one if it exists,
>> including properties. However, I observe the following:
>> >>>
>> >>> RESTSessionCatalog loads previous table metadata, adds new properties
>> (MetadataUpdate.SetProperties), and invokes the backend
>> >>> The backend (e.g., Polaris) will typically invoke
>> "CatalogHandler.updateTable." There, the previous table state, including
>> its properties, is loaded
>> >>> Finally, metadata updates are applied, and old table properties are
>> merged with new ones. That is, if the old table has properties [a=1, b=2],
>> and the new table has properties [b=3, c=4], then the final properties
>> would be [a=1, b=3, c=4], while the user expects [b=3, c=4].
>> >>>
>> >>> It looks like a bug because the user expects complete property
>> replacement instead of a merge. Shall we explicitly clear all previous
>> properties in RESTSessionCatalog.Builder.replaceTransaction?
>> >>>
>> >>> Regards,
>> >>> Vladimir.
>> >>>
>> >>>
>> >>>
>> >>> --
>> >>> Vladimir Ozerov
>> >>> Founder
>> >>> querifylabs.com
>> >
>> >
>> >
>> > --
>> > Vladimir Ozerov
>> > Founder
>> > querifylabs.com
>>
>
>
> --
> *Vladimir Ozerov*
> Founder
> querifylabs.com
>

Reply via email to