Nobody is claiming Cassandra is a relational I'm not sure why that keeps coming up. On Mon, Oct 3, 2016 at 10:53 AM Edward Capriolo <edlinuxg...@gmail.com> wrote:
> My original point can be summed up as: > > Do not define cassandra in terms SMILES & METAPHORS. Such words include > "like" and "close relative". > > For the specifics: > > > Any relational db could (and I'm sure one does!) allow for sparse fields > as well. MySQL can be backed by rocksdb now, does that make it not a row > store? > > > Lets draw some lines, a relational database is clearly defined. > > https://en.wikipedia.org/wiki/Edgar_F._Codd > > Codd's theorem <https://en.wikipedia.org/wiki/Codd%27s_theorem>, a result > proven in his seminal work on the relational model, equates the expressive > power of relational algebra > <https://en.wikipedia.org/wiki/Relational_algebra> and relational calculus > <https://en.wikipedia.org/wiki/Relational_calculus> (both of which, > lacking recursion, are strictly less powerful thanfirst-order logic > <https://en.wikipedia.org/wiki/First-order_logic>).[*citation needed > <https://en.wikipedia.org/wiki/Wikipedia:Citation_needed>*] > > As the relational model started to become fashionable in the early 1980s, > Codd fought a sometimes bitter campaign to prevent the term being misused > by database vendors who had merely added a relational veneer to older > technology. As part of this campaign, he published his 12 rules > <https://en.wikipedia.org/wiki/Codd%27s_12_rules> to define what > constituted a relational database. This made his position in IBM > increasingly difficult, so he left to form his own consulting company with > Chris Date and others. > > Cassandra is not a relational database. > > I am have attempted to illustrate that a "row store" is defined as well. I > do not believe Cassandra is a "row store". > > > > "Just because it uses log structured storage, sparse fields, and > semi-flexible collections doesn't disqualify it from calling it a "row > store"" > > What is the definition of "row store". Is it a logical construct or a > physical one? > > Why isn't mongo DB a "row store"? I can drop a schema on top of mongo and > present it as rows and columns. It seems to pass the litmus test being > presented. > > https://github.com/mongodb/mongo-hadoop/wiki/Hive-Usage > > > > > > On Mon, Oct 3, 2016 at 10:02 AM, Jonathan Haddad <j...@jonhaddad.com> > wrote: > > Sorry Ed, but you're really stretching here. A table in Cassandra is > structured by a schema with the data for each row stored together in each > data file. Just because it uses log structured storage, sparse fields, and > semi-flexible collections doesn't disqualify it from calling it a "row > store" > > Postgres added flexible storage through hstore, I don't hear anyone > arguing that it needs to be renamed. > > Any relational db could (and I'm sure one does!) allow for sparse fields > as well. MySQL can be backed by rocksdb now, does that make it not a row > store? > > You're arguing that everything is wrong but you're not proposing an > alternative, which is not productive. > On Mon, Oct 3, 2016 at 9:40 AM Edward Capriolo <edlinuxg...@gmail.com> > wrote: > > Also every piece of techincal information that describes a rowstore > > http://cs-www.cs.yale.edu/homes/dna/talks/abadi-sigmod08-slides.pdf > https://en.wikipedia.org/wiki/Column-oriented_DBMS#Row-oriented_systems > > Does it like this: > > 001:10,Smith,Joe,40000; > 002:12,Jones,Mary,50000; > 003:11,Johnson,Cathy,44000; > 004:22,Jones,Bob,55000; > > > > The never depict a scenario where a the data looks like this on disk: > > 001:10,Smith > > 001:10,40000; > > Which is much closer to how Cassandra *stores* it's data. > > > > On Fri, Sep 30, 2016 at 5:12 PM, Benedict Elliott Smith < > bened...@apache.org> wrote: > > Absolutely. A "partitioned row store" is exactly what I would call it. > As it happens, our README thinks the same, which is fantastic. > > I thought I'd take a look at the rest of our cohort, and didn't get far > before disappointment. HBase literally calls itself a "*column-oriented* > store" > - which is so totally wrong it's simultaneously hilarious and tragic. > > I guess we can't blame the wider internet for misunderstanding/misnaming > us poor "wide column stores" if even one of the major examples doesn't know > what it, itself, is! > > > > > On 30 September 2016 at 21:47, Jonathan Haddad <j...@jonhaddad.com> wrote: > > +1000 to what Benedict says. I usually call it a "partitioned row store" > which usually needs some extra explanation but is more accurate than > "column family" or whatever other thrift era terminology people still use. > On Fri, Sep 30, 2016 at 1:53 PM DuyHai Doan <doanduy...@gmail.com> wrote: > > I used to present Cassandra as a NoSQL datastore with "distributed" table. > This definition is closer to CQL and has some academic background > (distributed hash table). > > > On Fri, Sep 30, 2016 at 7:43 PM, Benedict Elliott Smith < > bened...@apache.org> wrote: > > Cassandra is not a "wide column store" anymore. It has a schema. Only > thrift users no longer think they have a schema (though they do), and > thrift is being deprecated. > > I really wish everyone would kill the term "wide column store" with fire. > It seems to have never meant anything beyond "schema-less, row-oriented", > and a "column store" means literally the opposite of this. > > Not only that, but people don't even seem to realise the term "column > store" existed long before "wide column store" and the latter is often > abbreviated to the former, as here: > http://www.planetcassandra.org/what-is-nosql/ > > Since it no longer applies, let's all agree as a community to forget this > awful nomenclature ever existed. > > > > On 30 September 2016 at 18:09, Joaquin Casares <joaq...@thelastpickle.com> > wrote: > > Hi Mehdi, > > I can help clarify a few things. > > As Carlos said, Cassandra is a Wide Column Store. Theoretically a row can > have 2 billion columns, but in practice it shouldn't have more than 100 > million columns. > > Cassandra partitions data to certain nodes based on the partition key(s), > but does provide the option of setting zero or more clustering keys. > Together, the partition key(s) and clustering key(s) form the primary key. > > When writing to Cassandra, you will need to provide the full primary key, > however, when reading from Cassandra, you only need to provide the full > partition key. > > When you only provide the partition key for a read operation, you're able > to return all columns that exist on that partition with low latency. These > columns are displayed as "CQL rows" to make it easier to reason about. > > Consider the schema: > > CREATE TABLE foo ( > bar uuid, > > boz uuid, > > baz timeuuid, > data1 text, > > data2 text, > > PRIMARY KEY ((bar, boz), baz) > > ); > > > When you write to Cassandra you will need to send bar, boz, and baz and > optionally data*, if it's relevant for that CQL row. If you chose not to > define a data* field for a particular CQL row, then nothing is stored nor > allocated on disk. But I wouldn't consider that caveat to be "schema-less". > > However, all writes to the same bar/boz will end up on the same Cassandra > replica set (a configurable number of nodes) and be stored on the same > place(s) on disk within the SSTable(s). And on disk, each field that's not > a partition key is stored as a column, including clustering keys (this is > optimized in Cassandra 3+, but now we're getting deep into internals). > > In this way you can get fast responses for all activity for bar/boz either > over time, or for a specific time, with roughly the same number of disk > seeks, with varying lengths on the disk scans. > > Hope that helps! > > Joaquin Casares > Consultant > Austin, TX > > Apache Cassandra Consulting > http://www.thelastpickle.com > > On Fri, Sep 30, 2016 at 11:40 AM, Carlos Alonso <i...@mrcalonso.com> > wrote: > > Cassandra is a Wide Column Store http://db-engines.com/en/system/Cassandra > > Carlos Alonso | Software Engineer | @calonso <https://twitter.com/calonso> > > On 30 September 2016 at 18:24, Mehdi Bada <mehdi.b...@dbi-services.com> > wrote: > > Hi all, > > I have a theoritical question: > - Is Apache Cassandra really a column store? > Column store mean storing the data as column rather than as a rows. > > In fact C* store the data as row, and data is partionned with row key. > > Finally, for me, Cassandra is a row oriented schema less DBMS.... Is it > true for you also??? > > Many thanks in advance for your reply > > Best Regards > Mehdi Bada > ---- > > *Mehdi Bada* | Consultant > Phone: +41 32 422 96 00 | Mobile: +41 79 928 75 48 | Fax: +41 32 422 96 15 > dbi services, Rue de la Jeunesse 2, CH-2800 Delémont > mehdi.b...@dbi-services.com > www.dbi-services.com > > > > > *⇒ dbi services is recruiting Oracle & SQL Server experts ! – Join the > team > <http://www.dbi-services.com/fr/dbi-services-et-ses-collaborateurs/offres-emplois-opportunites-carrieres/>* > > > > > > > >