I second Ryan here, it would be great to clarify in the
"implementation notes" section.

Thanks !
Regards
JB

On Wed, Oct 23, 2024 at 1:10 AM rdb...@gmail.com <rdb...@gmail.com> wrote:
>
> Thanks Vladimir! Would you like to open a PR to make that change? It sounds 
> like another good item to put into the "Implementation notes" section.
>
> On Sun, Oct 20, 2024 at 11:41 PM Vladimir Ozerov <voze...@querifylabs.com> 
> wrote:
>>
>> Hi Jean-Baptiste,
>>
>> Agreed. REST spec looks good. I am talking about the general spec, where it 
>> might be useful to add a hint to engine developers, that CREATE OR REPLACE 
>> semantics in Iceberg is expected to follow slightly different semantics. 
>> This is already broken in Trino: depending on catalog type users may get 
>> either classical "DROP + CREATE" (for non-REST catalogs), or "CREATE AND 
>> UPDATE" for REST catalog. For Flink, their official docs say that CREATE OR 
>> REPLACE == DROP + CREATE, while for Iceberg tables this should not be the 
>> case. These are definitively things that should be fixed at engine levels. 
>> But at the same time it highlights that engine developers are having hard 
>> time defining proper semantics for CREATE OR REPLACE in the Iceberg 
>> integrations, so a paragraph or so in the main Iceberg spec may help us 
>> align expectations.
>>
>> Regards,
>> Vladimir.
>>
>> On Mon, Oct 21, 2024 at 8:28 AM Jean-Baptiste Onofré <j...@nanthrax.net> 
>> wrote:
>>>
>>> Hi Vladimir,
>>>
>>> As Ryan said, it's not a bug: CREATE OR REPLACE can be seen as "CREATE
>>> AND UPDATE" from table format perspective. Specifically for the
>>> properties, it makes sense to not delete the current properties as it
>>> can be used in several use cases (security, tables grouping, ...).
>>> I'm not sure a REST Spec update is required, probably more on the
>>> engine side. In the REST Spec, you can create a table
>>> (https://github.com/apache/iceberg/blob/main/open-api/rest-catalog-open-api.yaml#L553)
>>> and update a table
>>> (https://github.com/apache/iceberg/blob/main/open-api/rest-catalog-open-api.yaml#L975),
>>> and it's up to the query engine to implement the "CREATE OR REPLACE"
>>> with the correct semantic.
>>>
>>> Regards
>>> JB
>>>
>>> On Sun, Oct 20, 2024 at 9:26 PM Vladimir Ozerov <voze...@querifylabs.com> 
>>> wrote:
>>> >
>>> > Hi Ryan,
>>> >
>>> > Thanks for the clarification. Yes, I think my confusion was caused by the 
>>> > fact that many engines treat CREATE OR REPLACE as a semantic equivalent 
>>> > of DROP + CREATE, which is performed atomically (e.g., Flink [1]). Table 
>>> > formats add history on top of that, which is expected to be retained, no 
>>> > questions here. Permission propagation also make sense. For properties 
>>> > things become a bit blurry, because on the one hand there are Iceberg 
>>> > specific properties, which may affect table maintenance, and on the other 
>>> > hand there are user-defined properties in the same bag. The question 
>>> > appeared in the first place because I observed a discrepancy in Trino: 
>>> > all catalogs except for REST completely overrides table properties on 
>>> > REPLACE, and REST catalog merges them, which might be confusing to end 
>>> > users. Perhaps some clarification at the spec level might be useful, 
>>> > because without agreement between engines the could be some subtle bugs 
>>> > in multi-engine environments, such as sudden data format changes between 
>>> > replaces, etc.
>>> >
>>> > [1] 
>>> > https://nightlies.apache.org/flink/flink-docs-master/docs/dev/table/sql/create/#create-or-replace-table
>>> >
>>> > Regards,
>>> > Vladimir.
>>> >
>>> > On Sun, Oct 20, 2024 at 9:20 PM rdb...@gmail.com <rdb...@gmail.com> wrote:
>>> >>
>>> >> Hi Vladimir,
>>> >>
>>> >> This isn't a bug. The behavior of CREATE OR REPLACE is to replace the 
>>> >> data of a table, but to maintain things like other refs, snapshot 
>>> >> history, permissions (if supported by the catalog), and table 
>>> >> properties. Table properties are replaced if they are set in the 
>>> >> operation like `b` in your example. This is not the same as a drop and 
>>> >> create, which may be what you want instead.
>>> >>
>>> >> The reason for this behavior is that the CREATE OR REPLACE operation is 
>>> >> used to replace a table's data without needing to handle schema changes 
>>> >> between versions. For example, producing a daily report table that 
>>> >> replaces the previous day. However, the table still exists and it is 
>>> >> valuable to be able to time travel to older versions or to be able to 
>>> >> use branches and tags. Clearly, that means that table history and refs 
>>> >> stick around so the table is not completely new every time it is 
>>> >> replaced.
>>> >>
>>> >> Adding on to that, properties control things like ref and snapshot 
>>> >> retention, file format, compression, and other settings. These aren't 
>>> >> settings that need to be carried through in every replace operation. And 
>>> >> it would make no sense if you set the snapshot retention because older 
>>> >> snapshots are retained, only to have it discarded the next time you 
>>> >> replace the table data. A good way to think about this is that table 
>>> >> properties are set infrequently, while table data changes regularly. And 
>>> >> the person changing the data may not be the person tuning the table 
>>> >> settings.
>>> >>
>>> >> Hopefully that helps,
>>> >>
>>> >> Ryan
>>> >>
>>> >> On Sun, Oct 20, 2024 at 9:45 AM Vladimir Ozerov 
>>> >> <voze...@querifylabs.com> wrote:
>>> >>>
>>> >>> Hi,
>>> >>>
>>> >>> Consider a REST catalog and a user calls "CREATE OR REPLACE <table>" 
>>> >>> command. When processing the command, engines will usually initiate a 
>>> >>> "createOrReplace" transaction and add metadata, such as the properties 
>>> >>> of a new table.
>>> >>>
>>> >>> Users expect a table to be replaced with a new one if it exists, 
>>> >>> including properties. However, I observe the following:
>>> >>>
>>> >>> RESTSessionCatalog loads previous table metadata, adds new properties 
>>> >>> (MetadataUpdate.SetProperties), and invokes the backend
>>> >>> The backend (e.g., Polaris) will typically invoke 
>>> >>> "CatalogHandler.updateTable." There, the previous table state, 
>>> >>> including its properties, is loaded
>>> >>> Finally, metadata updates are applied, and old table properties are 
>>> >>> merged with new ones. That is, if the old table has properties [a=1, 
>>> >>> b=2], and the new table has properties [b=3, c=4], then the final 
>>> >>> properties would be [a=1, b=3, c=4], while the user expects [b=3, c=4].
>>> >>>
>>> >>> It looks like a bug because the user expects complete property 
>>> >>> replacement instead of a merge. Shall we explicitly clear all previous 
>>> >>> properties in RESTSessionCatalog.Builder.replaceTransaction?
>>> >>>
>>> >>> Regards,
>>> >>> Vladimir.
>>> >>>
>>> >>>
>>> >>>
>>> >>> --
>>> >>> Vladimir Ozerov
>>> >>> Founder
>>> >>> querifylabs.com
>>> >
>>> >
>>> >
>>> > --
>>> > Vladimir Ozerov
>>> > Founder
>>> > querifylabs.com
>>
>>
>>
>> --
>> Vladimir Ozerov
>> Founder
>> querifylabs.com

Reply via email to