Make sure the driver is configured for token aware routing, otherwise the coordinator node may have to redirect your write, adding a network hop.
To be absolutely clear, Cassandra uses the distributed, parallel model for Big Data - lots of multi-threaded clients with lots of nodes. Clusters with less than six or eight nodes and using a single, single-threaded client are not a representative usage of Cassandra. Replication is presumed as well. Anything less than RF=3 is simply not a representative or recommended usage of Cassandra. Similarly, writes at less than QUORUM are neither representative nor recommended. CL=ONE has to update the memtable as well, not just the commit log. Flushing to sstables occurs once the memtables reach some threshold size.See: http://docs.datastax.com/en/cassandra/2.0/cassandra/dml/dml_config_consistency_c.html -- Jack Krupansky On Thu, Dec 31, 2015 at 11:13 AM, Jonathan Haddad <j...@jonhaddad.com> wrote: > The limitation is on the driver side. Try looking at > execute_concurrent_with_args in the cassandra.concurrent module to get > parallel writes with prepared statements. > > https://datastax.github.io/python-driver/api/cassandra/concurrent.html > > On Wed, Dec 30, 2015 at 11:34 PM Alexandre Beaulne < > alexandre.beau...@gmail.com> wrote: > >> Hi everyone, >> >> First and foremost thanks to everyone involved with making C* available >> to the world, it is a great technology to have access to. >> >> I'm experimenting with C* for one of our projects and I cannot reproduce >> the write speeds C* is lauded for. I would appreciate some guidance as to >> what I'm doing wrong. >> >> *Setup*: I have one, single-threaded, python client (using Datastax's >> python driver), writing (no reads) to a C* cluster. All C* nodes are >> launched by running the official Docker container. There's a single >> keyspace with replication factor of 1 and client is set to consistency >> level LOCAL ONE. In that keyspace there is a single table with ~40 columns >> of mixed types. Two columns are set as primary key and two more as >> clustering columns. The primary key is close to uniformly distributed in >> the dataset. The writer is in a tight-loop, building CQL 3 insert >> statements one by one and executing them against the C* cluster. >> >> *Specs*: Cassandra v3.0.1, python-driver v3.0.0, host is CentOS 7 with >> 40 cores @ 3GHz and 66Gb of RAM. >> >> In the course of my experimentation I came up with 7 scenarios trying to >> isolate the performance bottleneck: >> >> *Scenario 1*: the writer simply build the insert statement strings >> without doing anything with them. >> >> Results: sample size: 200002, percentiles (ms): [50] 0.00 - [95] 0.01 - >> [99] 0.01 [100] 0.05 >> >> *Scenario 2*: the writer open a TCP socket and send the insert statement >> string to a simple reader running on the same host. The reader then append >> that insert statement string to a file on disk, mimicking a commit log of >> some sort. >> >> Results: sample size: 200002, percentiles (ms): [50] 0.01 - [95] 0.02 - >> [99] 0.03 [100] 63.33 >> >> *Scenario 3*: is identical to scenario 2, but the reader is ran inside a >> Docker container, to measure if there is any overhead from running in the >> container. >> >> Results: sample size: 200002, percentiles (ms): [50] 0.01 - [95] 0.01 - >> [99] 0.01 [100] 4.45 >> >> * Scenario 4*: the writer asynchronously executes the insert statements >> against a single-node C* cluster. >> >> Results: sample size: 200002, percentiles (ms): [50] 0.07 - [95] 0.15 - >> [99] 0.56 [100] 534.09 >> >> *Scenario 5*: the writer synchronously executes the insert statements >> against a single-node C* cluster. >> >> Results: sample size: 200002, percentiles (ms): [50] 1.40 - [95] 1.46 - >> [99] 1.54 [100] 41.75 >> >> *Scenario 6*: the writer asynchronously executes the insert statements >> against a four-nodes C* cluster. >> >> Results: sample size: 200002, percentiles (ms): [50] 0.09 - [95] 0.14 - >> [99] 0.16 [100] 838.83 >> >> *Scenario 7*: the writer synchronously executes the insert statements >> against a four-nodes C* cluster. >> >> Results: sample size: 200002, percentiles (ms): [50] 1.73 - [95] 1.89 - >> [99] 2.15 [100] 50.94 >> >> Looking at scenario 3 & 5, a synchronous write statement to C* is about >> 150x slower than appending to a flat file. Now I understand write to a DB >> is more involved than appending to a file, but I'm surprised by the >> magnitude of the difference. I thought all C* did for writes with >> consistency level of 1 was to append the write to its commit log and >> return, then distribute the write across the cluster in an eventual >> consistency manner. More than 1 ms per write is less than a 1000 writes per >> second, far from big data velocity. >> >> What am I doing wrong? Are writes supposed to be batched before inserted? >> Instead of appending rows to the table, would it be more efficient to >> append columns to the rows? Why writes are so slow? >> >> Thanks for your time, >> Alex >> >