Hi Jean-Baptiste, Agreed. REST spec looks good. I am talking about the general spec, where it might be useful to add a hint to engine developers, that CREATE OR REPLACE semantics in Iceberg is expected to follow slightly different semantics. This is already broken in Trino: depending on catalog type users may get either classical "DROP + CREATE" (for non-REST catalogs), or "CREATE AND UPDATE" for REST catalog. For Flink, their official docs say that CREATE OR REPLACE == DROP + CREATE, while for Iceberg tables this should not be the case. These are definitively things that should be fixed at engine levels. But at the same time it highlights that engine developers are having hard time defining proper semantics for CREATE OR REPLACE in the Iceberg integrations, so a paragraph or so in the main Iceberg spec may help us align expectations.
Regards, Vladimir. On Mon, Oct 21, 2024 at 8:28 AM Jean-Baptiste Onofré <j...@nanthrax.net> wrote: > Hi Vladimir, > > As Ryan said, it's not a bug: CREATE OR REPLACE can be seen as "CREATE > AND UPDATE" from table format perspective. Specifically for the > properties, it makes sense to not delete the current properties as it > can be used in several use cases (security, tables grouping, ...). > I'm not sure a REST Spec update is required, probably more on the > engine side. In the REST Spec, you can create a table > ( > https://github.com/apache/iceberg/blob/main/open-api/rest-catalog-open-api.yaml#L553 > ) > and update a table > ( > https://github.com/apache/iceberg/blob/main/open-api/rest-catalog-open-api.yaml#L975 > ), > and it's up to the query engine to implement the "CREATE OR REPLACE" > with the correct semantic. > > Regards > JB > > On Sun, Oct 20, 2024 at 9:26 PM Vladimir Ozerov <voze...@querifylabs.com> > wrote: > > > > Hi Ryan, > > > > Thanks for the clarification. Yes, I think my confusion was caused by > the fact that many engines treat CREATE OR REPLACE as a semantic equivalent > of DROP + CREATE, which is performed atomically (e.g., Flink [1]). Table > formats add history on top of that, which is expected to be retained, no > questions here. Permission propagation also make sense. For properties > things become a bit blurry, because on the one hand there are Iceberg > specific properties, which may affect table maintenance, and on the other > hand there are user-defined properties in the same bag. The question > appeared in the first place because I observed a discrepancy in Trino: all > catalogs except for REST completely overrides table properties on REPLACE, > and REST catalog merges them, which might be confusing to end users. > Perhaps some clarification at the spec level might be useful, because > without agreement between engines the could be some subtle bugs in > multi-engine environments, such as sudden data format changes between > replaces, etc. > > > > [1] > https://nightlies.apache.org/flink/flink-docs-master/docs/dev/table/sql/create/#create-or-replace-table > > > > Regards, > > Vladimir. > > > > On Sun, Oct 20, 2024 at 9:20 PM rdb...@gmail.com <rdb...@gmail.com> > wrote: > >> > >> Hi Vladimir, > >> > >> This isn't a bug. The behavior of CREATE OR REPLACE is to replace the > data of a table, but to maintain things like other refs, snapshot history, > permissions (if supported by the catalog), and table properties. Table > properties are replaced if they are set in the operation like `b` in your > example. This is not the same as a drop and create, which may be what you > want instead. > >> > >> The reason for this behavior is that the CREATE OR REPLACE operation is > used to replace a table's data without needing to handle schema changes > between versions. For example, producing a daily report table that replaces > the previous day. However, the table still exists and it is valuable to be > able to time travel to older versions or to be able to use branches and > tags. Clearly, that means that table history and refs stick around so the > table is not completely new every time it is replaced. > >> > >> Adding on to that, properties control things like ref and snapshot > retention, file format, compression, and other settings. These aren't > settings that need to be carried through in every replace operation. And it > would make no sense if you set the snapshot retention because older > snapshots are retained, only to have it discarded the next time you replace > the table data. A good way to think about this is that table properties are > set infrequently, while table data changes regularly. And the person > changing the data may not be the person tuning the table settings. > >> > >> Hopefully that helps, > >> > >> Ryan > >> > >> On Sun, Oct 20, 2024 at 9:45 AM Vladimir Ozerov < > voze...@querifylabs.com> wrote: > >>> > >>> Hi, > >>> > >>> Consider a REST catalog and a user calls "CREATE OR REPLACE <table>" > command. When processing the command, engines will usually initiate a > "createOrReplace" transaction and add metadata, such as the properties of a > new table. > >>> > >>> Users expect a table to be replaced with a new one if it exists, > including properties. However, I observe the following: > >>> > >>> RESTSessionCatalog loads previous table metadata, adds new properties > (MetadataUpdate.SetProperties), and invokes the backend > >>> The backend (e.g., Polaris) will typically invoke > "CatalogHandler.updateTable." There, the previous table state, including > its properties, is loaded > >>> Finally, metadata updates are applied, and old table properties are > merged with new ones. That is, if the old table has properties [a=1, b=2], > and the new table has properties [b=3, c=4], then the final properties > would be [a=1, b=3, c=4], while the user expects [b=3, c=4]. > >>> > >>> It looks like a bug because the user expects complete property > replacement instead of a merge. Shall we explicitly clear all previous > properties in RESTSessionCatalog.Builder.replaceTransaction? > >>> > >>> Regards, > >>> Vladimir. > >>> > >>> > >>> > >>> -- > >>> Vladimir Ozerov > >>> Founder > >>> querifylabs.com > > > > > > > > -- > > Vladimir Ozerov > > Founder > > querifylabs.com > -- *Vladimir Ozerov* Founder querifylabs.com