Dimo, how do you generate sstables? Do you mean load data locally on a cassandra node and use sstableloader?
On Fri, Aug 2, 2019, 5:48 PM Dimo Velev <dimo.ve...@gmail.com> wrote: > Hi, > > Batches will actually slow down the process because they mean a different > thing in C* - as you read they are just grouping changes together that you > want executed atomically. > > Cassandra does not really have indices so that is different than a > relational DB. However, after writing stuff to Cassandra it generates many > smallish partitions of the data. These are then joined in the background > together to improve read performance. > > You have two options from my experience: > > Option 1: use normal CQL api in async mode. This will create a high CPU > load on your cluster. Depending on whether that is fine for you that might > be the easiest solution. > > Option 2: generate sstables locally and use the sstableloader to upload > them into the cluster. The streaming does not generate high cpu load so it > is a viable option for clusters with other operational load. > > Option 2 scales with the number of cores of the machine generating the > sstables. If you can split your data you can generate sstables on multiple > machines. In contrast, option 1 scales with your cluster. If you have a > large cluster that is idling, it would be better to use option 1. > > With both options I was able to write at about 50-100K rows / sec on my > laptop and local Cassandra. The speed heavily depends on the size of your > rows. > > Back to your question — I guess option2 is similar to what you are used to > from tools like sqlloader for relational DBMSes > > I had a requirement of loading a few 100 mio rows per day into an > operational cluster so I went with option 2 to offload the cpu load to > reduce impact on the reading side during the loads. > > Cheers, > Dimo > > > Sent from my iPad > > > On 2. Aug 2019, at 18:59, p...@xvalheru.org wrote: > > > > Hi, > > > > I need to upload to Cassandra about 7 billions of records. What is the > best setup of Cassandra for this task? Will usage of batch speeds up the > upload (I've read somewhere that batch in Cassandra is dedicated to > atomicity not to speeding up communication)? How Cassandra internally works > related to indexing? In SQL databases when uploading such amount of data is > suggested to turn off indexing and then turn on. Is something simmillar > possible in Cassandra? > > > > Thanks for all suggestions. > > > > Pat > > > > ---------------------------------------- > > Freehosting PIPNI - http://www.pipni.cz/ > > > > > > --------------------------------------------------------------------- > > To unsubscribe, e-mail: user-unsubscr...@cassandra.apache.org > > For additional commands, e-mail: user-h...@cassandra.apache.org > > > > --------------------------------------------------------------------- > To unsubscribe, e-mail: user-unsubscr...@cassandra.apache.org > For additional commands, e-mail: user-h...@cassandra.apache.org > >