I'm reviving this thread because I'm looking for a non-hacky way to migrate
data from one cluster to another using nodetool snapshot and sstableloader
without having to preserve dropped columns in the new schema. In my view,
that's just cruft and confusion that keeps building.

The best idea I can come up with is to do the following in the source
cluster:

   1. Use the cqlsh COPY FROM command to export the data in the table.
   2. Drop the table.
   3. Re-create the table.
   4. Use the cqlsh COPY TO command to import the data into the new
   incarnation of the table.


This approach is predicated on two assumptions:

   - The re-created table has no knowledge of the history of the old table
   by the same name.
   - The amount of data in the table doesn't exceed what the COPY command
   can handle.


If the dropped columns exist in the table in an environment where there's a
lot of data, then we'd have to use some other mechanism to capture and
reload the data.

If you see something wrong about this approach or you have a better way to
do it, I'd be glad to hear from you.

On Tue, Feb 19, 2019 at 11:31 AM Jeff Jirsa <jji...@gmail.com> wrote:

> You can also manually add the dropped column to the appropriate table to
> eliminate the issue. Has to be done by a human, a new cluster would have no
> way of learning about a dropped column, and the missing metadata cannot be
> inferred.
>
>
> On Tue, Feb 19, 2019 at 10:58 AM Elliott Sims <elli...@backblaze.com>
> wrote:
>
>> When a snapshot is taken, it includes a "schema.cql" file.  That should
>> be sufficient to restore whatever you need to restore.  I'd argue that
>> neither automatically resurrecting a dropped table nor silently failing to
>> restore it is a good behavior, so it's not unreasonable to have the user
>> re-create the table then choose if they want to re-drop it.
>>
>>
>> On Tue, Feb 19, 2019 at 7:28 AM Hannu Kröger <hkro...@gmail.com> wrote:
>>
>>> Hi,
>>>
>>> I would like to bring this issue to your attention.
>>>
>>> Link to the ticket:
>>> https://issues.apache.org/jira/browse/CASSANDRA-14336
>>>
>>> Basically if a table contains dropped columns and you try to restore a
>>> snapshot to a new cluster, that will fail because of an error like
>>> "java.lang.RuntimeException: Unknown column XXX during deserialization”.
>>>
>>> I feel this is quite serious problem for backup and restore
>>> functionality of Cassandra. You cannot restore a backup to a new cluster if
>>> columns have been dropped.
>>>
>>> There have been other similar tickets that have been apparently closed
>>> but based on my test with 3.11.4, the issue still persists.
>>>
>>> Best Regards,
>>> Hannu Kröger
>>>
>>

Reply via email to