Hey Cassandra folks, I'm trying to change the schema of an existing table by creating a new one and migrating the data.
The initial table schema looks like this: *CREATE TABLE IF NOT EXISTS initial_table ( user_id text, message_id timeuuid, interaction_state text, interaction_timestamp timestamp, PRIMARY KEY ((user_id), message_id, interaction_state, interaction_timestamp));* We're trying to remove interaction timestamp from the PK - same schema but with *PRIMARY KEY ((user_id), message_id, interaction_state)* When importing the .csv dump obtained from the *initial_table, *the timestamp column seems to be written in a weird way. Multiple rows from the old schema need to be merged to a single entry of the new schema. For most cases, it seems the last entry entry gets copied over to the new table while for others a random one gets copied. Check out the below csv sample and the copy from result. *initial_table_dump.csv* *123,ed6c69a0-0add-11b2-8080-808080808080,DISMISSED,2020-01-03 17:50:59+0000123,ed6c69a0-0add-11b2-8080-808080808080,DISMISSED,2020-01-10 00:05:41+0000* *copy new_table(user_id, message_id, interaction_state, interaction_timestamp) from '~/initial_table_dump.csv';* *Result:* *user_id | message_id | interaction_state | interaction_timestamp---------+--------------------------------------+-------------------+-------------------------- 123 | ed6c69a0-0add-11b2-8080-808080808080 | DISMISSED | 2020-01-03 17:50:59+0000* Notice the first row from the csv gets written into the new table in this case - here there are only two rows, but for multiple ones it seems a random one would be copied over, not the first/last one necessarily. When updating the interaction_timestamp column value as below, it seems to copy the latest entry to the new table. *initial_table_dump_2.csv* *123,ed6c69a0-0add-11b2-8080-808080808080,DISMISSED,2020-01-03 17:50:59+0000123,ed6c69a0-0add-11b2-8080-808080808080,DISMISSED,2020-01-05 00:05:41+0000* *- perform same copy from operation - * *Result:* *user_id | message_id | interaction_state | interaction_timestamp---------+--------------------------------------+-------------------+-------------------------- 123 | ed6c69a0-0add-11b2-8080-808080808080 | DISMISSED | **2020-01-05 00:05:41+0000* Could someone help me understand why this might happen? Does the 'copy from' follow the order from the csv when doing the import or there are no order guarantees? I'm using the below cqlsh and Cassandra versions: *[cqlsh 5.0.1 | Cassandra 2.2.15 | CQL spec 3.3.1 | Native protocol v4]* Thanks, Bogdan