Hello Julien,

After reading the excelent post and video by Alain Rodriguez, maybe you
should read the paper Performance Tuning of Big Data Platform: Cassandra
Case Study
<http://www.diva-portal.org/smash/record.jsf?pid=diva2%3A948824&dswid=-5280>
by SATHVIK KATAM. In the results he sets new values to memTable Cleanup
Threshold  and Key cache size.
Although it is not proven that the same results will persist in different
environments, it is a good starting point.

Lucas Benevides

2018-01-30 6:12 GMT-02:00 Julien Moumne <jmou...@deezer.com>:

> Hello, I am looking for best practices for the following use case :
>
> Once a day, we insert at the same time 10 full tables (several 100GiB
> each) using Spark C* driver, without batching, with CL set to ALL.
>
> Whether skinny rows or wide rows, data for a partition key is always
> completely updated / overwritten, ie. every command is an insert.
>
> This imposes a great load on the cluster (huge CPU consumption), this load
> greatly impacts the constant reads we have. Read latency are fine the rest
> of the time.
>
> Is there any best practices we should follow to ease the load when
> importing data into C* except
>  - reducing the number of concurrent writes and throughput on the driver
> side
>  - reducing the number of compaction threads and throughput on the cluster
>
> In particular :
>  - is there any evidence that writing multiple tables at the same time
> produces more load than writing the tables one at a time when tables are
> completely written at once such as we do?
>  - because of the heavy writes, we use STC. Is it the best choice
> considering data is completely overwritten once a day? Tables contain
> collections and UDTs.
>
> (We manage data expiration with TTL set to several days.
> We use SSDs.)
>
> Thanks!
>

Reply via email to