Thanks Aaron, really help! 2011/5/16 aaron morton <aa...@thelastpickle.com>
> batch_mutate() and insert() follow the a similar execution path to a single > insert in the server. It's not like putting multiple statements in a > Transaction in the RDBMS. > > Where they do differ is that you can provide multiple columns for a row in > a column family, and these will be applied as one operation including only > one write to the commit log. However row you send requires a write to the > commit log. > > What sort of data are you writing ? Are their multiple columns per row ? > > Another consideration is that each row becomes an mutation in the cluster. > If a connection sends 1000's of rows at once all of it's mutations *could* > momentarily fill all the available mutation workers on a node. This can slow > down other clients connected to the cluster if they also need to write to > that node. Watch the TPStats to see if the mutation pool has spikes in the > pending range. You may want to reduce the batch size if clients are seeing > high latency. > > Hope that helps. > > ----------------- > Aaron Morton > Freelance Cassandra Developer > @aaronmorton > http://www.thelastpickle.com > > On 15 May 2011, at 10:34, Xiaowei Wang wrote: > > > Hi, > > > > We use Cassandra 0.7.4 to do TPC-C data loading on ec2 nodes. The loading > driver is written in pycassa. We test the loading speed on insert and > batch_insert, but it seems no significant difference. I know Cassandra first > write data to memory. But still confused why batch_insert does not quick > than single row insert. We only batch 2000 or 3000 rows a time.. > > > > Thanks for your help! > > > > Best, > > Xiaowei > >