Re: Cassandra data model right definition

Benedict Elliott Smith Mon, 03 Oct 2016 15:47:27 -0700

I did not ascribe blame.  I only empathised with their predicament;  I
don't want to listen to either of us, either!






On 3 October 2016 at 19:45, Edward Capriolo <edlinuxg...@gmail.com> wrote:

> You know what don't "go low" and suggest the recent un-subscriber on me.
>
> If your so eager to deal with my pull request please review this one:
> I would rather you review this pull request: https://issues.
> apache.org/jira/browse/CASSANDRA-10825
>
>
>
>
>
> On Mon, Oct 3, 2016 at 1:04 PM, Benedict Elliott Smith <
> bened...@apache.org> wrote:
>
>> Nobody is disputing that the docs can and should be improved to avoid
>> this misreading.  I've invited Ed to file a JIRA and/or pull request twice
>> now.
>>
>> You are of course just as welcome to do this.  Perhaps you will actually
>> do it, so we can all move on with our lives!
>>
>>
>>
>>
>> On 3 October 2016 at 17:45, Peter Lin <wool...@gmail.com> wrote:
>>
>>> I've met clients that read the cassandra docs and then said in a big
>>> meeting "it's just like relational database, it has tables just like
>>> sqlserver/oracle."
>>>
>>> I'm not putting words in other people's mouth either, but I've heard
>>> that said enough times to want to puke. Does the docs claim cassandra is
>>> relational ? it absolutely doesn't make that claim, but the docs play
>>> loosey goosey with terminology. End result is it confuses new users that
>>> aren't experts, or technology managers that try to make a case for
>>> cassandra.
>>>
>>> we can make all the excuses we want, but that doesn't change the fact
>>> the docs aren't user friendly. writing great documentation is tough and
>>> most developers hate it. It's cuz we suck at it. There I said it, "we SUCK
>>> as writing user friendly documentation". As many people have pointed out,
>>> it's not unique to Cassandra. 80% of the tech docs out there suck, starting
>>> with IBM at the top.
>>>
>>> Saying the docs suck isn't an indictment of anyone, it's just the
>>> reality of writing good documentation.
>>>
>>> On Mon, Oct 3, 2016 at 12:33 PM, Jonathan Haddad <j...@jonhaddad.com>
>>> wrote:
>>>
>>>> Nobody is claiming Cassandra is a relational I'm not sure why that
>>>> keeps coming up.
>>>> On Mon, Oct 3, 2016 at 10:53 AM Edward Capriolo <edlinuxg...@gmail.com>
>>>> wrote:
>>>>
>>>>> My original point can be summed up as:
>>>>>
>>>>> Do not define cassandra in terms SMILES & METAPHORS. Such words
>>>>> include "like" and "close relative".
>>>>>
>>>>> For the specifics:
>>>>>
>>>>>
>>>>> Any relational db could (and I'm sure one does!) allow for sparse
>>>>> fields as well. MySQL can be backed by rocksdb now, does that make it not 
>>>>> a
>>>>> row store?
>>>>>
>>>>>
>>>>> Lets draw some lines, a relational database is clearly defined.
>>>>>
>>>>> https://en.wikipedia.org/wiki/Edgar_F._Codd
>>>>>
>>>>> Codd's theorem <https://en.wikipedia.org/wiki/Codd%27s_theorem>, a
>>>>> result proven in his seminal work on the relational model, equates the
>>>>> expressive power of relational algebra
>>>>> <https://en.wikipedia.org/wiki/Relational_algebra> and relational
>>>>> calculus <https://en.wikipedia.org/wiki/Relational_calculus> (both of
>>>>> which, lacking recursion, are strictly less powerful thanfirst-order
>>>>> logic <https://en.wikipedia.org/wiki/First-order_logic>).[*citation
>>>>> needed <https://en.wikipedia.org/wiki/Wikipedia:Citation_needed>*]
>>>>>
>>>>> As the relational model started to become fashionable in the early
>>>>> 1980s, Codd fought a sometimes bitter campaign to prevent the term being
>>>>> misused by database vendors who had merely added a relational veneer to
>>>>> older technology. As part of this campaign, he published his 12 rules
>>>>> <https://en.wikipedia.org/wiki/Codd%27s_12_rules> to define what
>>>>> constituted a relational database. This made his position in IBM
>>>>> increasingly difficult, so he left to form his own consulting company with
>>>>> Chris Date and others.
>>>>>
>>>>> Cassandra is not a relational database.
>>>>>
>>>>> I am have attempted to illustrate that a "row store" is defined as
>>>>> well. I do not believe Cassandra is a "row store".
>>>>>
>>>>>
>>>>>
>>>>> "Just because it uses log structured storage, sparse fields, and
>>>>> semi-flexible collections doesn't disqualify it from calling it a "row
>>>>> store""
>>>>>
>>>>> What is the definition of "row store". Is it a logical construct or a
>>>>> physical one?
>>>>>
>>>>> Why isn't mongo DB a "row store"? I can drop a schema on top of mongo
>>>>> and present it as rows and columns. It seems to pass the litmus test being
>>>>> presented.
>>>>>
>>>>> https://github.com/mongodb/mongo-hadoop/wiki/Hive-Usage
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>> On Mon, Oct 3, 2016 at 10:02 AM, Jonathan Haddad <j...@jonhaddad.com>
>>>>> wrote:
>>>>>
>>>>> Sorry Ed, but you're really stretching here. A table in Cassandra is
>>>>> structured by a schema with the data for each row stored together in each
>>>>> data file. Just because it uses log structured storage, sparse fields, and
>>>>> semi-flexible collections doesn't disqualify it from calling it a "row
>>>>> store"
>>>>>
>>>>> Postgres added flexible storage through hstore, I don't hear anyone
>>>>> arguing that it needs to be renamed.
>>>>>
>>>>> Any relational db could (and I'm sure one does!) allow for sparse
>>>>> fields as well. MySQL can be backed by rocksdb now, does that make it not 
>>>>> a
>>>>> row store?
>>>>>
>>>>> You're arguing that everything is wrong but you're not proposing an
>>>>> alternative, which is not productive.
>>>>> On Mon, Oct 3, 2016 at 9:40 AM Edward Capriolo <edlinuxg...@gmail.com>
>>>>> wrote:
>>>>>
>>>>> Also every piece of techincal information that describes a rowstore
>>>>>
>>>>> http://cs-www.cs.yale.edu/homes/dna/talks/abadi-sigmod08-slides.pdf
>>>>> https://en.wikipedia.org/wiki/Column-oriented_DBMS#Row-orien
>>>>> ted_systems
>>>>>
>>>>> Does it like this:
>>>>>
>>>>> 001:10,Smith,Joe,40000;
>>>>> 002:12,Jones,Mary,50000;
>>>>> 003:11,Johnson,Cathy,44000;
>>>>> 004:22,Jones,Bob,55000;
>>>>>
>>>>>
>>>>>
>>>>> The never depict a scenario where a the data looks like this on disk:
>>>>>
>>>>> 001:10,Smith
>>>>>
>>>>> 001:10,40000;
>>>>>
>>>>> Which is much closer to how Cassandra *stores* it's data.
>>>>>
>>>>>
>>>>>
>>>>> On Fri, Sep 30, 2016 at 5:12 PM, Benedict Elliott Smith <
>>>>> bened...@apache.org> wrote:
>>>>>
>>>>> Absolutely.  A "partitioned row store" is exactly what I would call
>>>>> it.  As it happens, our README thinks the same, which is fantastic.
>>>>>
>>>>> I thought I'd take a look at the rest of our cohort, and didn't get
>>>>> far before disappointment.  HBase literally calls itself a "
>>>>> *column-oriented* store" - which is so totally wrong it's
>>>>> simultaneously hilarious and tragic.
>>>>>
>>>>> I guess we can't blame the wider internet for
>>>>> misunderstanding/misnaming us poor "wide column stores" if even one of the
>>>>> major examples doesn't know what it, itself, is!
>>>>>
>>>>>
>>>>>
>>>>>
>>>>> On 30 September 2016 at 21:47, Jonathan Haddad <j...@jonhaddad.com>
>>>>> wrote:
>>>>>
>>>>> +1000 to what Benedict says. I usually call it a "partitioned row
>>>>> store" which usually needs some extra explanation but is more accurate 
>>>>> than
>>>>> "column family" or whatever other thrift era terminology people still use.
>>>>> On Fri, Sep 30, 2016 at 1:53 PM DuyHai Doan <doanduy...@gmail.com>
>>>>> wrote:
>>>>>
>>>>> I used to present Cassandra as a NoSQL datastore with "distributed"
>>>>> table. This definition is closer to CQL and has some academic background
>>>>> (distributed hash table).
>>>>>
>>>>>
>>>>> On Fri, Sep 30, 2016 at 7:43 PM, Benedict Elliott Smith <
>>>>> bened...@apache.org> wrote:
>>>>>
>>>>> Cassandra is not a "wide column store" anymore.  It has a schema.
>>>>> Only thrift users no longer think they have a schema (though they do), and
>>>>> thrift is being deprecated.
>>>>>
>>>>> I really wish everyone would kill the term "wide column store" with
>>>>> fire.  It seems to have never meant anything beyond "schema-less,
>>>>> row-oriented", and a "column store" means literally the opposite of this.
>>>>>
>>>>> Not only that, but people don't even seem to realise the term "column
>>>>> store" existed long before "wide column store" and the latter is often
>>>>> abbreviated to the former, as here: http://www.planetcassandra.org
>>>>> /what-is-nosql/
>>>>>
>>>>> Since it no longer applies, let's all agree as a community to forget
>>>>> this awful nomenclature ever existed.
>>>>>
>>>>>
>>>>>
>>>>> On 30 September 2016 at 18:09, Joaquin Casares <
>>>>> joaq...@thelastpickle.com> wrote:
>>>>>
>>>>> Hi Mehdi,
>>>>>
>>>>> I can help clarify a few things.
>>>>>
>>>>> As Carlos said, Cassandra is a Wide Column Store. Theoretically a row
>>>>> can have 2 billion columns, but in practice it shouldn't have more than 
>>>>> 100
>>>>> million columns.
>>>>>
>>>>> Cassandra partitions data to certain nodes based on the partition
>>>>> key(s), but does provide the option of setting zero or more clustering
>>>>> keys. Together, the partition key(s) and clustering key(s) form the 
>>>>> primary
>>>>> key.
>>>>>
>>>>> When writing to Cassandra, you will need to provide the full primary
>>>>> key, however, when reading from Cassandra, you only need to provide the
>>>>> full partition key.
>>>>>
>>>>> When you only provide the partition key for a read operation, you're
>>>>> able to return all columns that exist on that partition with low latency.
>>>>> These columns are displayed as "CQL rows" to make it easier to reason 
>>>>> about.
>>>>>
>>>>> Consider the schema:
>>>>>
>>>>> CREATE TABLE foo (
>>>>>   bar uuid,
>>>>>
>>>>>   boz uuid,
>>>>>
>>>>>   baz timeuuid,
>>>>>   data1 text,
>>>>>
>>>>>   data2 text,
>>>>>
>>>>>   PRIMARY KEY ((bar, boz), baz)
>>>>>
>>>>> );
>>>>>
>>>>>
>>>>> When you write to Cassandra you will need to send bar, boz, and baz
>>>>> and optionally data*, if it's relevant for that CQL row. If you chose not
>>>>> to define a data* field for a particular CQL row, then nothing is stored
>>>>> nor allocated on disk. But I wouldn't consider that caveat to be
>>>>> "schema-less".
>>>>>
>>>>> However, all writes to the same bar/boz will end up on the same
>>>>> Cassandra replica set (a configurable number of nodes) and be stored on 
>>>>> the
>>>>> same place(s) on disk within the SSTable(s). And on disk, each field 
>>>>> that's
>>>>> not a partition key is stored as a column, including clustering keys (this
>>>>> is optimized in Cassandra 3+, but now we're getting deep into internals).
>>>>>
>>>>> In this way you can get fast responses for all activity for bar/boz
>>>>> either over time, or for a specific time, with roughly the same number of
>>>>> disk seeks, with varying lengths on the disk scans.
>>>>>
>>>>> Hope that helps!
>>>>>
>>>>> Joaquin Casares
>>>>> Consultant
>>>>> Austin, TX
>>>>>
>>>>> Apache Cassandra Consulting
>>>>> http://www.thelastpickle.com
>>>>>
>>>>> On Fri, Sep 30, 2016 at 11:40 AM, Carlos Alonso <i...@mrcalonso.com>
>>>>> wrote:
>>>>>
>>>>> Cassandra is a Wide Column Store http://db-engines.com/en
>>>>> /system/Cassandra
>>>>>
>>>>> Carlos Alonso | Software Engineer | @calonso
>>>>> <https://twitter.com/calonso>
>>>>>
>>>>> On 30 September 2016 at 18:24, Mehdi Bada <mehdi.b...@dbi-services.com
>>>>> > wrote:
>>>>>
>>>>> Hi all,
>>>>>
>>>>> I have a theoritical question:
>>>>> - Is Apache Cassandra really a column store?
>>>>> Column store mean storing the data as column rather than as a rows.
>>>>>
>>>>> In fact C* store the data as row, and data is partionned with row key.
>>>>>
>>>>> Finally, for me, Cassandra is a row oriented schema less DBMS.... Is
>>>>> it true for you also???
>>>>>
>>>>> Many thanks in advance for your reply
>>>>>
>>>>> Best Regards
>>>>> Mehdi Bada
>>>>> ----
>>>>>
>>>>> *Mehdi Bada* | Consultant
>>>>> Phone: +41 32 422 96 00 | Mobile: +41 79 928 75 48 | Fax: +41 32 422
>>>>> 96 15
>>>>> dbi services, Rue de la Jeunesse 2, CH-2800 Delémont
>>>>> mehdi.b...@dbi-services.com
>>>>> www.dbi-services.com
>>>>>
>>>>>
>>>>>
>>>>>
>>>>> *⇒ dbi services is recruiting Oracle & SQL Server experts ! – Join the
>>>>> team
>>>>> <http://www.dbi-services.com/fr/dbi-services-et-ses-collaborateurs/offres-emplois-opportunites-carrieres/>*
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>
>>
>

Re: Cassandra data model right definition

Reply via email to