Today I've also seen this benchmark in Chinese websites. "SequoiaDB" seems
come from a Chinese startup company, and in db-engines ranking
<http://db-engines.com/en/ranking> it's score is 0.00. So IMO I have to say
I think this benchmark is a "soft sell". They compare three databases, two
written by c++ and one by java, and use a very tricky testcase to make
Cassandra can not hold all data in memtables.  After all, java need more
memory than c++. For a on-disk database, generally data size of one node is
much larger than RAM, and it's performance of memory query is less
important than disk query.

So I think this benchmark have no value at all.

2014-12-19 14:47 GMT+08:00 Wilm Schumacher <wilm.schumac...@gmail.com>:
>
>  Hi,
>
> I'm always interessted in such benchmark experiments, because the
> databases evolve so fast, that the race is always open and there is a lot
> motion in there.
>
> And of course I askes myself the same question. And I think that this
> publication is unreliable. For 4 reasons (from reading very fast, perhaps
> there is more):
>
> 1.) It is unclear what this is all about. The title is "NoSQL Performance
> Testing". The subtitle is "In-Memory Performance Comparison of SequoiaDB,
> Cassandra,  and MongoDB". However, in the introduction there is not one
> word about "in memory performance". The introduction could be a general
> introduction for a general "on-disk-nosql" benchmark. So ... only the
> subtitle (and a short sentence in the "Result Summary") says what this is
> actually about.
>
> 2.) There are very important databases missing. For "in memory" e.g.
> redis. If e.g. redis is not a valid candidate in this race, why is this
> so?MySQL is capable of "in memory" distributed databanking, too.
>
> 3.) The methodology is unclear. Perhaps I'm the only one, but what does
> "Run workload for 30 minutes (workload file workload[1-5]) " mean for mixed
> read/write ops? Why 30 min? Okay, I can image, that the authors estimated
> the throughput, preset the number of 100 Mio rows and designed it to be
> larger than the estimated throughput in x minutes. However, all this
> information is missing. And why 45% and 22% of RAM? My first Idea would be
> a VERY low ration, like 2% or so, and a VERY large ratio, like 80-90%. And
> than everything in between. Is 22% or 45% somehow a magic number?
> Furthermore in the Result summary there 1/2 and 1/4 of RAM are discussed.
> Okay, 22% is near 1/4 ... but where does the difference origin from? And
> btw. ... 22% of what? Stuff to insert? Stuff already insererted? It's all
> deductable, but it's strange that the description is so sloppy.
>
> 4.) There is no repetion of the loads (as I understand). Its one run, one
> result ... and it's done. I don't know a lot of cassandra in in-memory use.
> But either the experiment should be repeated quite some runs OR it should
> be explained why this is not neccessary.
>
> Okay, perhaps 1 is a little picky, and 4 is a little fussy. But 3 is
> strange and 2 stinks.
>
> Well, just my first impression. And that's Cassandra is very fast ;).
>
> Best regards
>
> Wilm
>
>
> Am 19.12.2014 um 06:41 schrieb diwayou:
>
>   i just have read this benchmark pdf, does anyone have some opinion
> about this?
> i think it's not fair about cassandra
> url:
> http://www.bankmark.de/wp-content/uploads/2014/12/bankmark-20141201-WP-NoSQLBenchmark.pdf
> ‍
> http://msrg.utoronto.ca/papers/NoSQLBenchmark‍
>
>
>

Reply via email to