Hi Vladimir,

This isn't a bug. The behavior of CREATE OR REPLACE is to replace the data
of a table, but to maintain things like other refs, snapshot history,
permissions (if supported by the catalog), and table properties. Table
properties are replaced if they are set in the operation like `b` in your
example. This is not the same as a drop and create, which may be what you
want instead.

The reason for this behavior is that the CREATE OR REPLACE operation is
used to replace a table's data without needing to handle schema changes
between versions. For example, producing a daily report table that replaces
the previous day. However, the table still exists and it is valuable to be
able to time travel to older versions or to be able to use branches and
tags. Clearly, that means that table history and refs stick around so the
table is not completely new every time it is replaced.

Adding on to that, properties control things like ref and snapshot
retention, file format, compression, and other settings. These aren't
settings that need to be carried through in every replace operation. And it
would make no sense if you set the snapshot retention because older
snapshots are retained, only to have it discarded the next time you replace
the table data. A good way to think about this is that table properties are
set infrequently, while table data changes regularly. And the person
changing the data may not be the person tuning the table settings.

Hopefully that helps,

Ryan

On Sun, Oct 20, 2024 at 9:45 AM Vladimir Ozerov <voze...@querifylabs.com>
wrote:

> Hi,
>
> Consider a REST catalog and a user calls "CREATE OR REPLACE <table>"
> command. When processing the command, engines will usually initiate a
> "createOrReplace" transaction and add metadata, such as the properties of a
> new table.
>
> Users expect a table to be replaced with a new one if it exists,
> including properties. However, I observe the following:
>
>    1. RESTSessionCatalog loads previous table metadata, adds new
>    properties (MetadataUpdate.SetProperties), and invokes the backend
>    2. The backend (e.g., Polaris) will typically invoke
>    "CatalogHandler.updateTable." There, the previous table state, including
>    its properties, is loaded
>    3. Finally, metadata updates are applied, and old table properties are
>    merged with new ones. That is, if the old table has properties [a=1, b=2],
>    and the new table has properties [b=3, c=4], then the final properties
>    would be [a=1, b=3, c=4], while the user expects [b=3, c=4].
>
> It looks like a bug because the user expects complete property replacement
> instead of a merge. Shall we explicitly clear all previous properties
> in RESTSessionCatalog.Builder.replaceTransaction?
>
> Regards,
> Vladimir.
>
>
>
> --
> *Vladimir Ozerov*
> Founder
> querifylabs.com
>

Reply via email to