Re: Cassandra data model right definition

Edward Capriolo Mon, 03 Oct 2016 07:54:19 -0700

My original point can be summed up as:

Do not define cassandra in terms SMILES & METAPHORS. Such words include
"like" and "close relative".


For the specifics:

Any relational db could (and I'm sure one does!) allow for sparse fields as
well. MySQL can be backed by rocksdb now, does that make it not a row store?

Lets draw some lines, a relational database is clearly defined.

https://en.wikipedia.org/wiki/Edgar_F._Codd

Codd's theorem <https://en.wikipedia.org/wiki/Codd%27s_theorem>, a result
proven in his seminal work on the relational model, equates the expressive
power of relational algebra
<https://en.wikipedia.org/wiki/Relational_algebra> and relational calculus
<https://en.wikipedia.org/wiki/Relational_calculus> (both of which, lacking
recursion, are strictly less powerful thanfirst-order logic
<https://en.wikipedia.org/wiki/First-order_logic>).[*citation needed
<https://en.wikipedia.org/wiki/Wikipedia:Citation_needed>*]

As the relational model started to become fashionable in the early 1980s,
Codd fought a sometimes bitter campaign to prevent the term being misused
by database vendors who had merely added a relational veneer to older
technology. As part of this campaign, he published his 12 rules
<https://en.wikipedia.org/wiki/Codd%27s_12_rules> to define what
constituted a relational database. This made his position in IBM
increasingly difficult, so he left to form his own consulting company with
Chris Date and others.

Cassandra is not a relational database.

I am have attempted to illustrate that a "row store" is defined as well. I
do not believe Cassandra is a "row store".

"Just because it uses log structured storage, sparse fields, and
semi-flexible collections doesn't disqualify it from calling it a "row
store""

What is the definition of "row store". Is it a logical construct or a
physical one?

Why isn't mongo DB a "row store"? I can drop a schema on top of mongo and
present it as rows and columns. It seems to pass the litmus test being
presented.

https://github.com/mongodb/mongo-hadoop/wiki/Hive-Usage





On Mon, Oct 3, 2016 at 10:02 AM, Jonathan Haddad <j...@jonhaddad.com> wrote:

> Sorry Ed, but you're really stretching here. A table in Cassandra is
> structured by a schema with the data for each row stored together in each
> data file. Just because it uses log structured storage, sparse fields, and
> semi-flexible collections doesn't disqualify it from calling it a "row
> store"
>
> Postgres added flexible storage through hstore, I don't hear anyone
> arguing that it needs to be renamed.
>
> Any relational db could (and I'm sure one does!) allow for sparse fields
> as well. MySQL can be backed by rocksdb now, does that make it not a row
> store?
>
> You're arguing that everything is wrong but you're not proposing an
> alternative, which is not productive.
> On Mon, Oct 3, 2016 at 9:40 AM Edward Capriolo <edlinuxg...@gmail.com>
> wrote:
>
>> Also every piece of techincal information that describes a rowstore
>>
>> http://cs-www.cs.yale.edu/homes/dna/talks/abadi-sigmod08-slides.pdf
>> https://en.wikipedia.org/wiki/Column-oriented_DBMS#Row-oriented_systems
>>
>> Does it like this:
>>
>> 001:10,Smith,Joe,40000;
>> 002:12,Jones,Mary,50000;
>> 003:11,Johnson,Cathy,44000;
>> 004:22,Jones,Bob,55000;
>>
>>
>>
>> The never depict a scenario where a the data looks like this on disk:
>>
>> 001:10,Smith
>>
>> 001:10,40000;
>>
>> Which is much closer to how Cassandra *stores* it's data.
>>
>>
>>
>> On Fri, Sep 30, 2016 at 5:12 PM, Benedict Elliott Smith <
>> bened...@apache.org> wrote:
>>
>> Absolutely.  A "partitioned row store" is exactly what I would call it.
>> As it happens, our README thinks the same, which is fantastic.
>>
>> I thought I'd take a look at the rest of our cohort, and didn't get far
>> before disappointment.  HBase literally calls itself a "*column-oriented* 
>> store"
>> - which is so totally wrong it's simultaneously hilarious and tragic.
>>
>> I guess we can't blame the wider internet for misunderstanding/misnaming
>> us poor "wide column stores" if even one of the major examples doesn't know
>> what it, itself, is!
>>
>>
>>
>>
>> On 30 September 2016 at 21:47, Jonathan Haddad <j...@jonhaddad.com> wrote:
>>
>> +1000 to what Benedict says. I usually call it a "partitioned row store"
>> which usually needs some extra explanation but is more accurate than
>> "column family" or whatever other thrift era terminology people still use.
>> On Fri, Sep 30, 2016 at 1:53 PM DuyHai Doan <doanduy...@gmail.com> wrote:
>>
>> I used to present Cassandra as a NoSQL datastore with "distributed"
>> table. This definition is closer to CQL and has some academic background
>> (distributed hash table).
>>
>>
>> On Fri, Sep 30, 2016 at 7:43 PM, Benedict Elliott Smith <
>> bened...@apache.org> wrote:
>>
>> Cassandra is not a "wide column store" anymore.  It has a schema.  Only
>> thrift users no longer think they have a schema (though they do), and
>> thrift is being deprecated.
>>
>> I really wish everyone would kill the term "wide column store" with
>> fire.  It seems to have never meant anything beyond "schema-less,
>> row-oriented", and a "column store" means literally the opposite of this.
>>
>> Not only that, but people don't even seem to realise the term "column
>> store" existed long before "wide column store" and the latter is often
>> abbreviated to the former, as here: http://www.planetcassandra.
>> org/what-is-nosql/
>>
>> Since it no longer applies, let's all agree as a community to forget this
>> awful nomenclature ever existed.
>>
>>
>>
>> On 30 September 2016 at 18:09, Joaquin Casares <joaq...@thelastpickle.com
>> > wrote:
>>
>> Hi Mehdi,
>>
>> I can help clarify a few things.
>>
>> As Carlos said, Cassandra is a Wide Column Store. Theoretically a row can
>> have 2 billion columns, but in practice it shouldn't have more than 100
>> million columns.
>>
>> Cassandra partitions data to certain nodes based on the partition key(s),
>> but does provide the option of setting zero or more clustering keys.
>> Together, the partition key(s) and clustering key(s) form the primary key.
>>
>> When writing to Cassandra, you will need to provide the full primary key,
>> however, when reading from Cassandra, you only need to provide the full
>> partition key.
>>
>> When you only provide the partition key for a read operation, you're able
>> to return all columns that exist on that partition with low latency. These
>> columns are displayed as "CQL rows" to make it easier to reason about.
>>
>> Consider the schema:
>>
>> CREATE TABLE foo (
>>   bar uuid,
>>
>>   boz uuid,
>>
>>   baz timeuuid,
>>   data1 text,
>>
>>   data2 text,
>>
>>   PRIMARY KEY ((bar, boz), baz)
>>
>> );
>>
>>
>> When you write to Cassandra you will need to send bar, boz, and baz and
>> optionally data*, if it's relevant for that CQL row. If you chose not to
>> define a data* field for a particular CQL row, then nothing is stored nor
>> allocated on disk. But I wouldn't consider that caveat to be "schema-less".
>>
>> However, all writes to the same bar/boz will end up on the same Cassandra
>> replica set (a configurable number of nodes) and be stored on the same
>> place(s) on disk within the SSTable(s). And on disk, each field that's not
>> a partition key is stored as a column, including clustering keys (this is
>> optimized in Cassandra 3+, but now we're getting deep into internals).
>>
>> In this way you can get fast responses for all activity for bar/boz
>> either over time, or for a specific time, with roughly the same number of
>> disk seeks, with varying lengths on the disk scans.
>>
>> Hope that helps!
>>
>> Joaquin Casares
>> Consultant
>> Austin, TX
>>
>> Apache Cassandra Consulting
>> http://www.thelastpickle.com
>>
>> On Fri, Sep 30, 2016 at 11:40 AM, Carlos Alonso <i...@mrcalonso.com>
>> wrote:
>>
>> Cassandra is a Wide Column Store http://db-engines.com/
>> en/system/Cassandra
>>
>> Carlos Alonso | Software Engineer | @calonso
>> <https://twitter.com/calonso>
>>
>> On 30 September 2016 at 18:24, Mehdi Bada <mehdi.b...@dbi-services.com>
>> wrote:
>>
>> Hi all,
>>
>> I have a theoritical question:
>> - Is Apache Cassandra really a column store?
>> Column store mean storing the data as column rather than as a rows.
>>
>> In fact C* store the data as row, and data is partionned with row key.
>>
>> Finally, for me, Cassandra is a row oriented schema less DBMS.... Is it
>> true for you also???
>>
>> Many thanks in advance for your reply
>>
>> Best Regards
>> Mehdi Bada
>> ----
>>
>> *Mehdi Bada* | Consultant
>> Phone: +41 32 422 96 00 | Mobile: +41 79 928 75 48 | Fax: +41 32 422 96
>> 15
>> dbi services, Rue de la Jeunesse 2, CH-2800 Delémont
>> mehdi.b...@dbi-services.com
>> www.dbi-services.com
>>
>>
>>
>>
>> *⇒ dbi services is recruiting Oracle & SQL Server experts ! – Join the
>> team
>> <http://www.dbi-services.com/fr/dbi-services-et-ses-collaborateurs/offres-emplois-opportunites-carrieres/>*
>>
>>
>>
>>
>>
>>
>>
>>

Re: Cassandra data model right definition

Reply via email to