Re: Performance Difference between Batch Insert and Bulk Load

2014-12-02 Thread Dong Dai
Yes. Thanks. I will reply it to user mailing list. Sorry for the inconvenience. - Dong > On Dec 2, 2014, at 9:33 AM, Aleksey Yeschenko wrote: > > Guys, please move this discussion to users mailing list. This one is for > Cassandra committers and other contributors, to discuss development of

Re: Performance Difference between Batch Insert and Bulk Load

2014-12-02 Thread Aleksey Yeschenko
Guys, please move this discussion to users mailing list. This one is for Cassandra committers and other contributors, to discuss development of Cassandra itself. -- AY > On Dec 2, 2014, at 16:17, Ryan Svihla wrote: > > mispoke > > "That's all correct but what you're not accounting for is if

Re: Performance Difference between Batch Insert and Bulk Load

2014-12-02 Thread Ryan Svihla
mispoke "That's all correct but what you're not accounting for is if you use a token aware client then the coordinator will likely not own all the data in a batch" should just be "That's all correct but what you're not accounting for is the coordinator will likely not own all the data in a batch

Re: Performance Difference between Batch Insert and Bulk Load

2014-12-02 Thread Ryan Svihla
On Mon, Dec 1, 2014 at 1:52 PM, Dong Dai wrote: > Thanks Ryan, and also thanks for your great blog post. > > However, this makes me more confused. Mainly about the coordinators. > > Based on my understanding, no matter it is batch insertion, ordinary sync > insert, or async insert, > the coordina

Re: Performance Difference between Batch Insert and Bulk Load

2014-12-01 Thread Dong Dai
Thanks Ryan, and also thanks for your great blog post. However, this makes me more confused. Mainly about the coordinators. Based on my understanding, no matter it is batch insertion, ordinary sync insert, or async insert, the coordinator was only selected once for the whole session by calling

Re: Performance Difference between Batch Insert and Bulk Load

2014-12-01 Thread Ryan Svihla
So there is a bit of a misunderstanding about the role of the coordinator in all this. If you use an UNLOGGED BATCH and all of those writes are in the same partition key, then yes it's a savings and acts as one mutation. If they're not however, you're asking the coordinator node to do work the clie

Re: Performance Difference between Batch Insert and Bulk Load

2014-12-01 Thread Dong Dai
Thank a lot for the reply, Raj, I understand they are different. But if we define a Batch with UNLOGGED, it will not guarantee the atomic transaction, and become more like a data import tool. According to my knowledge, BATCH statement packs several mutations into one RPC to save time. Similarly

Re: Performance Difference between Batch Insert and Bulk Load

2014-12-01 Thread Rajanarayanan Thottuvaikkatumana
BATCH statement and Bulk Load are totally different things. The BATCH statement comes in the atomic transaction space which provides a way to make more than one statements into an atomic unit and bulk loader provides the ability to bulk load external data into a cluster. Two are totally differen