A couple things I would like to note: 1. Cassandra does not determine how data is stored on disk, the compaction strategy does. One could, in theory, (and I believe some are trying) could create a column-store compaction strategy. There is a large effort in the database community overall to separate the query execution from the storage engine, it is becoming increasingly more incorrect to say a database is an "X store" database.
2. "X-store" is not used, and never has been, to describe how data is represented or queried. When most database storage engines describe their storage as "X-store" they are referring to contiguous bytes on disk. In traditional rows-store engines, on a single node, the definition is as follows: "All data for a row is stored as a single block of contiguous bytes on disk". Traditional column-stores are also defined as "All data for a column is stored contiguously on disk". Old-style Cassandra was a key-value column-family store in that "all data for a family of columns belonging to a given key were stored contiguously on disk" So when talking about Cassandra and all currently merged compaction strategies, yes, it fits the definition of a row-store in that "All data for a row is stored as contiguous bytes on disk", however, it goes further because "All data for all rows in a given partition are stored as contiguous bytes on disk". So at the highest level one could say it is a "Partition-store" but that is pretty vague. I think it is deserving of a different naming definition which is why I like the term "Partitioned-row-store" which gives insight into the fact that it is rows being stored on disk, in a partitioned format. PS. To address the pedants, yes, by these definitions you would have to assume that a partition resides in a single SSTable. While most compaction strategies try hard to achieve this it currently only exists in one that I know. You could call it a "Partitioned-row-depenendent-upon-compaction-strategy-store" but that is just terrible. On Mon, Oct 3, 2016 at 10:02 AM, Jonathan Haddad <j...@jonhaddad.com> wrote: > Sorry Ed, but you're really stretching here. A table in Cassandra is > structured by a schema with the data for each row stored together in each > data file. Just because it uses log structured storage, sparse fields, and > semi-flexible collections doesn't disqualify it from calling it a "row > store" > > Postgres added flexible storage through hstore, I don't hear anyone > arguing that it needs to be renamed. > > Any relational db could (and I'm sure one does!) allow for sparse fields > as well. MySQL can be backed by rocksdb now, does that make it not a row > store? > > You're arguing that everything is wrong but you're not proposing an > alternative, which is not productive. > > On Mon, Oct 3, 2016 at 9:40 AM Edward Capriolo <edlinuxg...@gmail.com> > wrote: > >> Also every piece of techincal information that describes a rowstore >> >> http://cs-www.cs.yale.edu/homes/dna/talks/abadi-sigmod08-slides.pdf >> https://en.wikipedia.org/wiki/Column-oriented_DBMS#Row-oriented_systems >> >> Does it like this: >> >> 001:10,Smith,Joe,40000; >> 002:12,Jones,Mary,50000; >> 003:11,Johnson,Cathy,44000; >> 004:22,Jones,Bob,55000; >> >> >> >> The never depict a scenario where a the data looks like this on disk: >> >> 001:10,Smith >> >> 001:10,40000; >> >> Which is much closer to how Cassandra *stores* it's data. >> >> >> >> On Fri, Sep 30, 2016 at 5:12 PM, Benedict Elliott Smith < >> bened...@apache.org> wrote: >> >> Absolutely. A "partitioned row store" is exactly what I would call it. >> As it happens, our README thinks the same, which is fantastic. >> >> I thought I'd take a look at the rest of our cohort, and didn't get far >> before disappointment. HBase literally calls itself a "*column-oriented* >> store" >> - which is so totally wrong it's simultaneously hilarious and tragic. >> >> I guess we can't blame the wider internet for misunderstanding/misnaming >> us poor "wide column stores" if even one of the major examples doesn't know >> what it, itself, is! >> >> >> >> >> On 30 September 2016 at 21:47, Jonathan Haddad <j...@jonhaddad.com> wrote: >> >> +1000 to what Benedict says. I usually call it a "partitioned row store" >> which usually needs some extra explanation but is more accurate than >> "column family" or whatever other thrift era terminology people still use. >> On Fri, Sep 30, 2016 at 1:53 PM DuyHai Doan <doanduy...@gmail.com> wrote: >> >> I used to present Cassandra as a NoSQL datastore with "distributed" >> table. This definition is closer to CQL and has some academic background >> (distributed hash table). >> >> >> On Fri, Sep 30, 2016 at 7:43 PM, Benedict Elliott Smith < >> bened...@apache.org> wrote: >> >> Cassandra is not a "wide column store" anymore. It has a schema. Only >> thrift users no longer think they have a schema (though they do), and >> thrift is being deprecated. >> >> I really wish everyone would kill the term "wide column store" with >> fire. It seems to have never meant anything beyond "schema-less, >> row-oriented", and a "column store" means literally the opposite of this. >> >> Not only that, but people don't even seem to realise the term "column >> store" existed long before "wide column store" and the latter is often >> abbreviated to the former, as here: http://www.planetcassandra. >> org/what-is-nosql/ >> >> Since it no longer applies, let's all agree as a community to forget this >> awful nomenclature ever existed. >> >> >> >> On 30 September 2016 at 18:09, Joaquin Casares <joaq...@thelastpickle.com >> > wrote: >> >> Hi Mehdi, >> >> I can help clarify a few things. >> >> As Carlos said, Cassandra is a Wide Column Store. Theoretically a row can >> have 2 billion columns, but in practice it shouldn't have more than 100 >> million columns. >> >> Cassandra partitions data to certain nodes based on the partition key(s), >> but does provide the option of setting zero or more clustering keys. >> Together, the partition key(s) and clustering key(s) form the primary key. >> >> When writing to Cassandra, you will need to provide the full primary key, >> however, when reading from Cassandra, you only need to provide the full >> partition key. >> >> When you only provide the partition key for a read operation, you're able >> to return all columns that exist on that partition with low latency. These >> columns are displayed as "CQL rows" to make it easier to reason about. >> >> Consider the schema: >> >> CREATE TABLE foo ( >> bar uuid, >> >> boz uuid, >> >> baz timeuuid, >> data1 text, >> >> data2 text, >> >> PRIMARY KEY ((bar, boz), baz) >> >> ); >> >> >> When you write to Cassandra you will need to send bar, boz, and baz and >> optionally data*, if it's relevant for that CQL row. If you chose not to >> define a data* field for a particular CQL row, then nothing is stored nor >> allocated on disk. But I wouldn't consider that caveat to be "schema-less". >> >> However, all writes to the same bar/boz will end up on the same Cassandra >> replica set (a configurable number of nodes) and be stored on the same >> place(s) on disk within the SSTable(s). And on disk, each field that's not >> a partition key is stored as a column, including clustering keys (this is >> optimized in Cassandra 3+, but now we're getting deep into internals). >> >> In this way you can get fast responses for all activity for bar/boz >> either over time, or for a specific time, with roughly the same number of >> disk seeks, with varying lengths on the disk scans. >> >> Hope that helps! >> >> Joaquin Casares >> Consultant >> Austin, TX >> >> Apache Cassandra Consulting >> http://www.thelastpickle.com >> >> On Fri, Sep 30, 2016 at 11:40 AM, Carlos Alonso <i...@mrcalonso.com> >> wrote: >> >> Cassandra is a Wide Column Store http://db-engines.com/ >> en/system/Cassandra >> >> Carlos Alonso | Software Engineer | @calonso >> <https://twitter.com/calonso> >> >> On 30 September 2016 at 18:24, Mehdi Bada <mehdi.b...@dbi-services.com> >> wrote: >> >> Hi all, >> >> I have a theoritical question: >> - Is Apache Cassandra really a column store? >> Column store mean storing the data as column rather than as a rows. >> >> In fact C* store the data as row, and data is partionned with row key. >> >> Finally, for me, Cassandra is a row oriented schema less DBMS.... Is it >> true for you also??? >> >> Many thanks in advance for your reply >> >> Best Regards >> Mehdi Bada >> ---- >> >> *Mehdi Bada* | Consultant >> Phone: +41 32 422 96 00 | Mobile: +41 79 928 75 48 | Fax: +41 32 422 96 >> 15 >> dbi services, Rue de la Jeunesse 2, CH-2800 Delémont >> mehdi.b...@dbi-services.com >> www.dbi-services.com >> >> >> >> >> *⇒ dbi services is recruiting Oracle & SQL Server experts ! – Join the >> team >> <http://www.dbi-services.com/fr/dbi-services-et-ses-collaborateurs/offres-emplois-opportunites-carrieres/>* >> >> >> >> >> >> >> >>