Re: Cassandra data model right definition

Jonathan Haddad Mon, 03 Oct 2016 09:34:24 -0700

Nobody is claiming Cassandra is a relational I'm not sure why that keeps
coming up.
On Mon, Oct 3, 2016 at 10:53 AM Edward Capriolo <edlinuxg...@gmail.com>
wrote:


> My original point can be summed up as:
>
> Do not define cassandra in terms SMILES & METAPHORS. Such words include
> "like" and "close relative".
>
> For the specifics:
>
>
> Any relational db could (and I'm sure one does!) allow for sparse fields
> as well. MySQL can be backed by rocksdb now, does that make it not a row
> store?
>
>
> Lets draw some lines, a relational database is clearly defined.
>
> https://en.wikipedia.org/wiki/Edgar_F._Codd
>
> Codd's theorem <https://en.wikipedia.org/wiki/Codd%27s_theorem>, a result
> proven in his seminal work on the relational model, equates the expressive
> power of relational algebra
> <https://en.wikipedia.org/wiki/Relational_algebra> and relational calculus
> <https://en.wikipedia.org/wiki/Relational_calculus> (both of which,
> lacking recursion, are strictly less powerful thanfirst-order logic
> <https://en.wikipedia.org/wiki/First-order_logic>).[*citation needed
> <https://en.wikipedia.org/wiki/Wikipedia:Citation_needed>*]
>
> As the relational model started to become fashionable in the early 1980s,
> Codd fought a sometimes bitter campaign to prevent the term being misused
> by database vendors who had merely added a relational veneer to older
> technology. As part of this campaign, he published his 12 rules
> <https://en.wikipedia.org/wiki/Codd%27s_12_rules> to define what
> constituted a relational database. This made his position in IBM
> increasingly difficult, so he left to form his own consulting company with
> Chris Date and others.
>
> Cassandra is not a relational database.
>
> I am have attempted to illustrate that a "row store" is defined as well. I
> do not believe Cassandra is a "row store".
>
>
>
> "Just because it uses log structured storage, sparse fields, and
> semi-flexible collections doesn't disqualify it from calling it a "row
> store""
>
> What is the definition of "row store". Is it a logical construct or a
> physical one?
>
> Why isn't mongo DB a "row store"? I can drop a schema on top of mongo and
> present it as rows and columns. It seems to pass the litmus test being
> presented.
>
> https://github.com/mongodb/mongo-hadoop/wiki/Hive-Usage
>
>
>
>
>
> On Mon, Oct 3, 2016 at 10:02 AM, Jonathan Haddad <j...@jonhaddad.com>
> wrote:
>
> Sorry Ed, but you're really stretching here. A table in Cassandra is
> structured by a schema with the data for each row stored together in each
> data file. Just because it uses log structured storage, sparse fields, and
> semi-flexible collections doesn't disqualify it from calling it a "row
> store"
>
> Postgres added flexible storage through hstore, I don't hear anyone
> arguing that it needs to be renamed.
>
> Any relational db could (and I'm sure one does!) allow for sparse fields
> as well. MySQL can be backed by rocksdb now, does that make it not a row
> store?
>
> You're arguing that everything is wrong but you're not proposing an
> alternative, which is not productive.
> On Mon, Oct 3, 2016 at 9:40 AM Edward Capriolo <edlinuxg...@gmail.com>
> wrote:
>
> Also every piece of techincal information that describes a rowstore
>
> http://cs-www.cs.yale.edu/homes/dna/talks/abadi-sigmod08-slides.pdf
> https://en.wikipedia.org/wiki/Column-oriented_DBMS#Row-oriented_systems
>
> Does it like this:
>
> 001:10,Smith,Joe,40000;
> 002:12,Jones,Mary,50000;
> 003:11,Johnson,Cathy,44000;
> 004:22,Jones,Bob,55000;
>
>
>
> The never depict a scenario where a the data looks like this on disk:
>
> 001:10,Smith
>
> 001:10,40000;
>
> Which is much closer to how Cassandra *stores* it's data.
>
>
>
> On Fri, Sep 30, 2016 at 5:12 PM, Benedict Elliott Smith <
> bened...@apache.org> wrote:
>
> Absolutely.  A "partitioned row store" is exactly what I would call it.
> As it happens, our README thinks the same, which is fantastic.
>
> I thought I'd take a look at the rest of our cohort, and didn't get far
> before disappointment.  HBase literally calls itself a "*column-oriented* 
> store"
> - which is so totally wrong it's simultaneously hilarious and tragic.
>
> I guess we can't blame the wider internet for misunderstanding/misnaming
> us poor "wide column stores" if even one of the major examples doesn't know
> what it, itself, is!
>
>
>
>
> On 30 September 2016 at 21:47, Jonathan Haddad <j...@jonhaddad.com> wrote:
>
> +1000 to what Benedict says. I usually call it a "partitioned row store"
> which usually needs some extra explanation but is more accurate than
> "column family" or whatever other thrift era terminology people still use.
> On Fri, Sep 30, 2016 at 1:53 PM DuyHai Doan <doanduy...@gmail.com> wrote:
>
> I used to present Cassandra as a NoSQL datastore with "distributed" table.
> This definition is closer to CQL and has some academic background
> (distributed hash table).
>
>
> On Fri, Sep 30, 2016 at 7:43 PM, Benedict Elliott Smith <
> bened...@apache.org> wrote:
>
> Cassandra is not a "wide column store" anymore.  It has a schema.  Only
> thrift users no longer think they have a schema (though they do), and
> thrift is being deprecated.
>
> I really wish everyone would kill the term "wide column store" with fire.
> It seems to have never meant anything beyond "schema-less, row-oriented",
> and a "column store" means literally the opposite of this.
>
> Not only that, but people don't even seem to realise the term "column
> store" existed long before "wide column store" and the latter is often
> abbreviated to the former, as here:
> http://www.planetcassandra.org/what-is-nosql/
>
> Since it no longer applies, let's all agree as a community to forget this
> awful nomenclature ever existed.
>
>
>
> On 30 September 2016 at 18:09, Joaquin Casares <joaq...@thelastpickle.com>
> wrote:
>
> Hi Mehdi,
>
> I can help clarify a few things.
>
> As Carlos said, Cassandra is a Wide Column Store. Theoretically a row can
> have 2 billion columns, but in practice it shouldn't have more than 100
> million columns.
>
> Cassandra partitions data to certain nodes based on the partition key(s),
> but does provide the option of setting zero or more clustering keys.
> Together, the partition key(s) and clustering key(s) form the primary key.
>
> When writing to Cassandra, you will need to provide the full primary key,
> however, when reading from Cassandra, you only need to provide the full
> partition key.
>
> When you only provide the partition key for a read operation, you're able
> to return all columns that exist on that partition with low latency. These
> columns are displayed as "CQL rows" to make it easier to reason about.
>
> Consider the schema:
>
> CREATE TABLE foo (
>   bar uuid,
>
>   boz uuid,
>
>   baz timeuuid,
>   data1 text,
>
>   data2 text,
>
>   PRIMARY KEY ((bar, boz), baz)
>
> );
>
>
> When you write to Cassandra you will need to send bar, boz, and baz and
> optionally data*, if it's relevant for that CQL row. If you chose not to
> define a data* field for a particular CQL row, then nothing is stored nor
> allocated on disk. But I wouldn't consider that caveat to be "schema-less".
>
> However, all writes to the same bar/boz will end up on the same Cassandra
> replica set (a configurable number of nodes) and be stored on the same
> place(s) on disk within the SSTable(s). And on disk, each field that's not
> a partition key is stored as a column, including clustering keys (this is
> optimized in Cassandra 3+, but now we're getting deep into internals).
>
> In this way you can get fast responses for all activity for bar/boz either
> over time, or for a specific time, with roughly the same number of disk
> seeks, with varying lengths on the disk scans.
>
> Hope that helps!
>
> Joaquin Casares
> Consultant
> Austin, TX
>
> Apache Cassandra Consulting
> http://www.thelastpickle.com
>
> On Fri, Sep 30, 2016 at 11:40 AM, Carlos Alonso <i...@mrcalonso.com>
> wrote:
>
> Cassandra is a Wide Column Store http://db-engines.com/en/system/Cassandra
>
> Carlos Alonso | Software Engineer | @calonso <https://twitter.com/calonso>
>
> On 30 September 2016 at 18:24, Mehdi Bada <mehdi.b...@dbi-services.com>
> wrote:
>
> Hi all,
>
> I have a theoritical question:
> - Is Apache Cassandra really a column store?
> Column store mean storing the data as column rather than as a rows.
>
> In fact C* store the data as row, and data is partionned with row key.
>
> Finally, for me, Cassandra is a row oriented schema less DBMS.... Is it
> true for you also???
>
> Many thanks in advance for your reply
>
> Best Regards
> Mehdi Bada
> ----
>
> *Mehdi Bada* | Consultant
> Phone: +41 32 422 96 00 | Mobile: +41 79 928 75 48 | Fax: +41 32 422 96 15
> dbi services, Rue de la Jeunesse 2, CH-2800 Delémont
> mehdi.b...@dbi-services.com
> www.dbi-services.com
>
>
>
>
> *⇒ dbi services is recruiting Oracle & SQL Server experts ! – Join the
> team
> <http://www.dbi-services.com/fr/dbi-services-et-ses-collaborateurs/offres-emplois-opportunites-carrieres/>*
>
>
>
>
>
>
>
>

Re: Cassandra data model right definition

Reply via email to