Nobody is disputing that the docs can and should be improved to avoid this misreading. I've invited Ed to file a JIRA and/or pull request twice now.
You are of course just as welcome to do this. Perhaps you will actually do it, so we can all move on with our lives! On 3 October 2016 at 17:45, Peter Lin <wool...@gmail.com> wrote: > I've met clients that read the cassandra docs and then said in a big > meeting "it's just like relational database, it has tables just like > sqlserver/oracle." > > I'm not putting words in other people's mouth either, but I've heard that > said enough times to want to puke. Does the docs claim cassandra is > relational ? it absolutely doesn't make that claim, but the docs play > loosey goosey with terminology. End result is it confuses new users that > aren't experts, or technology managers that try to make a case for > cassandra. > > we can make all the excuses we want, but that doesn't change the fact the > docs aren't user friendly. writing great documentation is tough and most > developers hate it. It's cuz we suck at it. There I said it, "we SUCK as > writing user friendly documentation". As many people have pointed out, it's > not unique to Cassandra. 80% of the tech docs out there suck, starting with > IBM at the top. > > Saying the docs suck isn't an indictment of anyone, it's just the reality > of writing good documentation. > > On Mon, Oct 3, 2016 at 12:33 PM, Jonathan Haddad <j...@jonhaddad.com> > wrote: > >> Nobody is claiming Cassandra is a relational I'm not sure why that keeps >> coming up. >> On Mon, Oct 3, 2016 at 10:53 AM Edward Capriolo <edlinuxg...@gmail.com> >> wrote: >> >>> My original point can be summed up as: >>> >>> Do not define cassandra in terms SMILES & METAPHORS. Such words include >>> "like" and "close relative". >>> >>> For the specifics: >>> >>> >>> Any relational db could (and I'm sure one does!) allow for sparse fields >>> as well. MySQL can be backed by rocksdb now, does that make it not a row >>> store? >>> >>> >>> Lets draw some lines, a relational database is clearly defined. >>> >>> https://en.wikipedia.org/wiki/Edgar_F._Codd >>> >>> Codd's theorem <https://en.wikipedia.org/wiki/Codd%27s_theorem>, a >>> result proven in his seminal work on the relational model, equates the >>> expressive power of relational algebra >>> <https://en.wikipedia.org/wiki/Relational_algebra> and relational >>> calculus <https://en.wikipedia.org/wiki/Relational_calculus> (both of >>> which, lacking recursion, are strictly less powerful thanfirst-order >>> logic <https://en.wikipedia.org/wiki/First-order_logic>).[*citation >>> needed <https://en.wikipedia.org/wiki/Wikipedia:Citation_needed>*] >>> >>> As the relational model started to become fashionable in the early >>> 1980s, Codd fought a sometimes bitter campaign to prevent the term being >>> misused by database vendors who had merely added a relational veneer to >>> older technology. As part of this campaign, he published his 12 rules >>> <https://en.wikipedia.org/wiki/Codd%27s_12_rules> to define what >>> constituted a relational database. This made his position in IBM >>> increasingly difficult, so he left to form his own consulting company with >>> Chris Date and others. >>> >>> Cassandra is not a relational database. >>> >>> I am have attempted to illustrate that a "row store" is defined as well. >>> I do not believe Cassandra is a "row store". >>> >>> >>> >>> "Just because it uses log structured storage, sparse fields, and >>> semi-flexible collections doesn't disqualify it from calling it a "row >>> store"" >>> >>> What is the definition of "row store". Is it a logical construct or a >>> physical one? >>> >>> Why isn't mongo DB a "row store"? I can drop a schema on top of mongo >>> and present it as rows and columns. It seems to pass the litmus test being >>> presented. >>> >>> https://github.com/mongodb/mongo-hadoop/wiki/Hive-Usage >>> >>> >>> >>> >>> >>> On Mon, Oct 3, 2016 at 10:02 AM, Jonathan Haddad <j...@jonhaddad.com> >>> wrote: >>> >>> Sorry Ed, but you're really stretching here. A table in Cassandra is >>> structured by a schema with the data for each row stored together in each >>> data file. Just because it uses log structured storage, sparse fields, and >>> semi-flexible collections doesn't disqualify it from calling it a "row >>> store" >>> >>> Postgres added flexible storage through hstore, I don't hear anyone >>> arguing that it needs to be renamed. >>> >>> Any relational db could (and I'm sure one does!) allow for sparse fields >>> as well. MySQL can be backed by rocksdb now, does that make it not a row >>> store? >>> >>> You're arguing that everything is wrong but you're not proposing an >>> alternative, which is not productive. >>> On Mon, Oct 3, 2016 at 9:40 AM Edward Capriolo <edlinuxg...@gmail.com> >>> wrote: >>> >>> Also every piece of techincal information that describes a rowstore >>> >>> http://cs-www.cs.yale.edu/homes/dna/talks/abadi-sigmod08-slides.pdf >>> https://en.wikipedia.org/wiki/Column-oriented_DBMS#Row-oriented_systems >>> >>> Does it like this: >>> >>> 001:10,Smith,Joe,40000; >>> 002:12,Jones,Mary,50000; >>> 003:11,Johnson,Cathy,44000; >>> 004:22,Jones,Bob,55000; >>> >>> >>> >>> The never depict a scenario where a the data looks like this on disk: >>> >>> 001:10,Smith >>> >>> 001:10,40000; >>> >>> Which is much closer to how Cassandra *stores* it's data. >>> >>> >>> >>> On Fri, Sep 30, 2016 at 5:12 PM, Benedict Elliott Smith < >>> bened...@apache.org> wrote: >>> >>> Absolutely. A "partitioned row store" is exactly what I would call it. >>> As it happens, our README thinks the same, which is fantastic. >>> >>> I thought I'd take a look at the rest of our cohort, and didn't get far >>> before disappointment. HBase literally calls itself a " >>> *column-oriented* store" - which is so totally wrong it's >>> simultaneously hilarious and tragic. >>> >>> I guess we can't blame the wider internet for misunderstanding/misnaming >>> us poor "wide column stores" if even one of the major examples doesn't know >>> what it, itself, is! >>> >>> >>> >>> >>> On 30 September 2016 at 21:47, Jonathan Haddad <j...@jonhaddad.com> >>> wrote: >>> >>> +1000 to what Benedict says. I usually call it a "partitioned row store" >>> which usually needs some extra explanation but is more accurate than >>> "column family" or whatever other thrift era terminology people still use. >>> On Fri, Sep 30, 2016 at 1:53 PM DuyHai Doan <doanduy...@gmail.com> >>> wrote: >>> >>> I used to present Cassandra as a NoSQL datastore with "distributed" >>> table. This definition is closer to CQL and has some academic background >>> (distributed hash table). >>> >>> >>> On Fri, Sep 30, 2016 at 7:43 PM, Benedict Elliott Smith < >>> bened...@apache.org> wrote: >>> >>> Cassandra is not a "wide column store" anymore. It has a schema. Only >>> thrift users no longer think they have a schema (though they do), and >>> thrift is being deprecated. >>> >>> I really wish everyone would kill the term "wide column store" with >>> fire. It seems to have never meant anything beyond "schema-less, >>> row-oriented", and a "column store" means literally the opposite of this. >>> >>> Not only that, but people don't even seem to realise the term "column >>> store" existed long before "wide column store" and the latter is often >>> abbreviated to the former, as here: http://www.planetcassandra.org >>> /what-is-nosql/ >>> >>> Since it no longer applies, let's all agree as a community to forget >>> this awful nomenclature ever existed. >>> >>> >>> >>> On 30 September 2016 at 18:09, Joaquin Casares < >>> joaq...@thelastpickle.com> wrote: >>> >>> Hi Mehdi, >>> >>> I can help clarify a few things. >>> >>> As Carlos said, Cassandra is a Wide Column Store. Theoretically a row >>> can have 2 billion columns, but in practice it shouldn't have more than 100 >>> million columns. >>> >>> Cassandra partitions data to certain nodes based on the partition >>> key(s), but does provide the option of setting zero or more clustering >>> keys. Together, the partition key(s) and clustering key(s) form the primary >>> key. >>> >>> When writing to Cassandra, you will need to provide the full primary >>> key, however, when reading from Cassandra, you only need to provide the >>> full partition key. >>> >>> When you only provide the partition key for a read operation, you're >>> able to return all columns that exist on that partition with low latency. >>> These columns are displayed as "CQL rows" to make it easier to reason about. >>> >>> Consider the schema: >>> >>> CREATE TABLE foo ( >>> bar uuid, >>> >>> boz uuid, >>> >>> baz timeuuid, >>> data1 text, >>> >>> data2 text, >>> >>> PRIMARY KEY ((bar, boz), baz) >>> >>> ); >>> >>> >>> When you write to Cassandra you will need to send bar, boz, and baz and >>> optionally data*, if it's relevant for that CQL row. If you chose not to >>> define a data* field for a particular CQL row, then nothing is stored nor >>> allocated on disk. But I wouldn't consider that caveat to be "schema-less". >>> >>> However, all writes to the same bar/boz will end up on the same >>> Cassandra replica set (a configurable number of nodes) and be stored on the >>> same place(s) on disk within the SSTable(s). And on disk, each field that's >>> not a partition key is stored as a column, including clustering keys (this >>> is optimized in Cassandra 3+, but now we're getting deep into internals). >>> >>> In this way you can get fast responses for all activity for bar/boz >>> either over time, or for a specific time, with roughly the same number of >>> disk seeks, with varying lengths on the disk scans. >>> >>> Hope that helps! >>> >>> Joaquin Casares >>> Consultant >>> Austin, TX >>> >>> Apache Cassandra Consulting >>> http://www.thelastpickle.com >>> >>> On Fri, Sep 30, 2016 at 11:40 AM, Carlos Alonso <i...@mrcalonso.com> >>> wrote: >>> >>> Cassandra is a Wide Column Store http://db-engines.com/en >>> /system/Cassandra >>> >>> Carlos Alonso | Software Engineer | @calonso >>> <https://twitter.com/calonso> >>> >>> On 30 September 2016 at 18:24, Mehdi Bada <mehdi.b...@dbi-services.com> >>> wrote: >>> >>> Hi all, >>> >>> I have a theoritical question: >>> - Is Apache Cassandra really a column store? >>> Column store mean storing the data as column rather than as a rows. >>> >>> In fact C* store the data as row, and data is partionned with row key. >>> >>> Finally, for me, Cassandra is a row oriented schema less DBMS.... Is it >>> true for you also??? >>> >>> Many thanks in advance for your reply >>> >>> Best Regards >>> Mehdi Bada >>> ---- >>> >>> *Mehdi Bada* | Consultant >>> Phone: +41 32 422 96 00 | Mobile: +41 79 928 75 48 | Fax: +41 32 422 96 >>> 15 >>> dbi services, Rue de la Jeunesse 2, CH-2800 Delémont >>> mehdi.b...@dbi-services.com >>> www.dbi-services.com >>> >>> >>> >>> >>> *⇒ dbi services is recruiting Oracle & SQL Server experts ! – Join the >>> team >>> <http://www.dbi-services.com/fr/dbi-services-et-ses-collaborateurs/offres-emplois-opportunites-carrieres/>* >>> >>> >>> >>> >>> >>> >>> >>> >