Re: Cqlsh copy command on a larger data set

Jai Bheemsen Rao Dhanwada Tue, 14 Jul 2020 08:51:19 -0700

Thank you for the suggestions

On Tue, Jul 14, 2020 at 1:42 AM Alex Ott <alex...@gmail.com> wrote:


> CQLSH definitely won't work for that amount of data, so you need to use
> other tools.
>
> But before selecting them, you need to define requirements. For example:
>
>    1. Are you copying the data into tables with exactly the same
>    structure?
>    2. Do you need to preserve metadata, like, writetime & TTL?
>
> Depending on that, you may have following choices:
>
>    - use sstableloader - it will preserve all metadata, like, ttl and
>    writetime. You just need to copy SSTable files, or stream directly from the
>    source cluster.  But this will require copying of data into tables with
>    exactly same structure (and in case of UDTs, the keyspace names should be
>    the same)
>    - use DSBulk - it's a very effective tool for unloading & loading data
>    from/to Cassandra/DSE. Use zstd compression for offloaded data to save disk
>    space (see blog links below for more details).  But the preserving metadata
>    could be a problem.
>    - use Spark + Spark Cassandra Connector. But also, preserving the
>    metadata is not an easy task, and requires programming to handle all edge
>    cases (see https://datastax-oss.atlassian.net/browse/SPARKC-596 for
>    details)
>
>
> blog series on DSBulk:
>
>    -
>    
> https://www.datastax.com/blog/2019/03/datastax-bulk-loader-introduction-and-loading
>    -
>    https://www.datastax.com/blog/2019/04/datastax-bulk-loader-more-loading
>    -
>    https://www.datastax.com/blog/2019/04/datastax-bulk-loader-common-settings
>    - https://www.datastax.com/blog/2019/06/datastax-bulk-loader-unloading
>    - https://www.datastax.com/blog/2019/07/datastax-bulk-loader-counting
>    -
>    
> https://www.datastax.com/blog/2019/12/datastax-bulk-loader-examples-loading-other-locations
>
>
> On Tue, Jul 14, 2020 at 1:47 AM Jai Bheemsen Rao Dhanwada <
> jaibheem...@gmail.com> wrote:
>
>> Hello,
>>
>> I would like to copy some data from one cassandra cluster to another
>> cassandra cluster using the CQLSH copy command. Is this the good approach
>> if the dataset size on the source cluster is very high(500G - 1TB)? If not
>> what is the safe approach? and are there any limitations/known issues to
>> keep in mind before attempting this?
>>
>
>
> --
> With best wishes,                    Alex Ott
> http://alexott.net/
> Twitter: alexott_en (English), alexott (Russian)
>

Re: Cqlsh copy command on a larger data set

Reply via email to