Today I've also seen this benchmark in Chinese websites. "SequoiaDB" seems come from a Chinese startup company, and in db-engines ranking <http://db-engines.com/en/ranking> it's score is 0.00. So IMO I have to say I think this benchmark is a "soft sell". They compare three databases, two written by c++ and one by java, and use a very tricky testcase to make Cassandra can not hold all data in memtables. After all, java need more memory than c++. For a on-disk database, generally data size of one node is much larger than RAM, and it's performance of memory query is less important than disk query.
So I think this benchmark have no value at all. 2014-12-19 14:47 GMT+08:00 Wilm Schumacher <wilm.schumac...@gmail.com>: > > Hi, > > I'm always interessted in such benchmark experiments, because the > databases evolve so fast, that the race is always open and there is a lot > motion in there. > > And of course I askes myself the same question. And I think that this > publication is unreliable. For 4 reasons (from reading very fast, perhaps > there is more): > > 1.) It is unclear what this is all about. The title is "NoSQL Performance > Testing". The subtitle is "In-Memory Performance Comparison of SequoiaDB, > Cassandra, and MongoDB". However, in the introduction there is not one > word about "in memory performance". The introduction could be a general > introduction for a general "on-disk-nosql" benchmark. So ... only the > subtitle (and a short sentence in the "Result Summary") says what this is > actually about. > > 2.) There are very important databases missing. For "in memory" e.g. > redis. If e.g. redis is not a valid candidate in this race, why is this > so?MySQL is capable of "in memory" distributed databanking, too. > > 3.) The methodology is unclear. Perhaps I'm the only one, but what does > "Run workload for 30 minutes (workload file workload[1-5]) " mean for mixed > read/write ops? Why 30 min? Okay, I can image, that the authors estimated > the throughput, preset the number of 100 Mio rows and designed it to be > larger than the estimated throughput in x minutes. However, all this > information is missing. And why 45% and 22% of RAM? My first Idea would be > a VERY low ration, like 2% or so, and a VERY large ratio, like 80-90%. And > than everything in between. Is 22% or 45% somehow a magic number? > Furthermore in the Result summary there 1/2 and 1/4 of RAM are discussed. > Okay, 22% is near 1/4 ... but where does the difference origin from? And > btw. ... 22% of what? Stuff to insert? Stuff already insererted? It's all > deductable, but it's strange that the description is so sloppy. > > 4.) There is no repetion of the loads (as I understand). Its one run, one > result ... and it's done. I don't know a lot of cassandra in in-memory use. > But either the experiment should be repeated quite some runs OR it should > be explained why this is not neccessary. > > Okay, perhaps 1 is a little picky, and 4 is a little fussy. But 3 is > strange and 2 stinks. > > Well, just my first impression. And that's Cassandra is very fast ;). > > Best regards > > Wilm > > > Am 19.12.2014 um 06:41 schrieb diwayou: > > i just have read this benchmark pdf, does anyone have some opinion > about this? > i think it's not fair about cassandra > url: > http://www.bankmark.de/wp-content/uploads/2014/12/bankmark-20141201-WP-NoSQLBenchmark.pdf > > http://msrg.utoronto.ca/papers/NoSQLBenchmark > > >