Re: efficiently generate complete database dump in text format

Daniel Chia Thu, 09 Oct 2014 09:26:48 -0700

You might also want to consider tools like
https://github.com/Netflix/aegisthus for the last step, which can help you
deal with tombstones and de-duplicate data.


Thanks,
Daniel

On Thu, Oct 9, 2014 at 12:19 AM, Gaurav Bhatnagar <gbhatna...@gmail.com>
wrote:

> Hi,
>    We have a Cassandra database column family containing 320 millions rows
> and each row contains about 15 columns. We want to take monthly dump of
> this single column family contained in this database in text format.
>
> We are planning to take following approach to implement this functionality
> 1. Take a snapshot of Cassandra database using nodetool utility. We
> specify -cf flag to
>      specify column family name so that snapshot contains data
> corresponding to a single
>      column family.
> 2. We take backup of this snapshot and move this backup to a separate
> physical machine.
> 3. We using "SStable to json conversion" utility to json convert all the
> data files into json
>     format.
>
> We have following questions/doubts regarding the above approach
> a) Generated json records contains "d" (IS_MARKED_FOR_DELETE) flag in json
> record
>      and can I safely ignore all such json records?
> b) If I ignore all records marked by "d" flag, than can generated json
> files in step 3, contain
>     duplicate records? I mean do multiple entries for same key.
>
> Do there can be any other better approach to generate data dumps in text
> format.
>
> Regards,
> Gaurav
>

Re: efficiently generate complete database dump in text format

Reply via email to