Hi,

When working on a time-travel in Trino I noticed the behavior which seems
counter-intuitive from the user perspective. Consider the following
sequence of SQL queries executed from Trino (could be any other query
engine). Here we create a table, populate it with some data, and then add
another column. The third operation is implemented in Trino via
Table.updateSchema().addColumn(...).commit().

CREATE TABLE t (a);        // v1
INSERT INTO t VALUES (1);  // v2
ALTER TABLE t ADD COLUMN b // v3
UPDATE t SET b = 2;        // v4

The trick is that the third operation creates a schema, but *doesn't*
create a snapshot. After that we endup with the following table metadata:
{
    snapshots: {
        v1: { schemaId: 0 } // Initial create
        v2: { schemaId: 0 } // Insert using the old schema
        v4: { schemaId: 1 } // Update using the new schema
    }
    schemaIds: { 0, 1 }
    currentSnapshot: v4
    currentSchema: 1
}

What happens next in Trino:

SELECT * FROM t;                  // Executed on { snapshot=v4, schemaId=1
}, returns { a=1, b=2 } as expected
SELECT * FROM t FOR TIMESTAMP v3; // Executed on { snapshot=v2, schemaId=0
}, throws exception: column "b" is not found

That is, we cannot query the table after the new column is added, but
before it is updated. This happens because there is no dedicated snapshot
for UpdateSchema action. To make things even more confusing, if we do not
have the operation v4 then we will be able to query the v3 state, because
engines will basically "reconstruct" it as a virtual snapshot using
TableMetadata.currentSnapshot and TableMetadata.currentSchema:

SELECT * FROM t;                  // Executed on { snapshot=v2, schemaId=1
}, returns { a=1, b=null } as expected
SELECT * FROM t FOR TIMESTAMP v3; // Executed on { snapshot=v2, schemaId=0
}, throws exception: column "b" is not found

My question is: shouldn't we always implicitly add new snapshots for all
schema changes? Because otherwise we cannot read all existing table states
which is confusing. Of course, this could be handled on the engine side,
but this will lead to inconsistent behaviour between different engines
depending on their internal implementation details.

Regards,
-- 
*Vladimir Ozerov*

Reply via email to