Re: Cassandra data model right definition

Benedict Elliott Smith Fri, 30 Sep 2016 14:13:25 -0700

Absolutely.  A "partitioned row store" is exactly what I would call it.  As
it happens, our README thinks the same, which is fantastic.


I thought I'd take a look at the rest of our cohort, and didn't get far
before disappointment.  HBase literally calls itself a
"*column-oriented* store"
- which is so totally wrong it's simultaneously hilarious and tragic.

I guess we can't blame the wider internet for misunderstanding/misnaming us
poor "wide column stores" if even one of the major examples doesn't know
what it, itself, is!




On 30 September 2016 at 21:47, Jonathan Haddad <j...@jonhaddad.com> wrote:

> +1000 to what Benedict says. I usually call it a "partitioned row store"
> which usually needs some extra explanation but is more accurate than
> "column family" or whatever other thrift era terminology people still use.
> On Fri, Sep 30, 2016 at 1:53 PM DuyHai Doan <doanduy...@gmail.com> wrote:
>
>> I used to present Cassandra as a NoSQL datastore with "distributed"
>> table. This definition is closer to CQL and has some academic background
>> (distributed hash table).
>>
>>
>> On Fri, Sep 30, 2016 at 7:43 PM, Benedict Elliott Smith <
>> bened...@apache.org> wrote:
>>
>>> Cassandra is not a "wide column store" anymore.  It has a schema.  Only
>>> thrift users no longer think they have a schema (though they do), and
>>> thrift is being deprecated.
>>>
>>> I really wish everyone would kill the term "wide column store" with
>>> fire.  It seems to have never meant anything beyond "schema-less,
>>> row-oriented", and a "column store" means literally the opposite of this.
>>>
>>> Not only that, but people don't even seem to realise the term "column
>>> store" existed long before "wide column store" and the latter is often
>>> abbreviated to the former, as here: http://www.planetcassandra.
>>> org/what-is-nosql/
>>>
>>> Since it no longer applies, let's all agree as a community to forget
>>> this awful nomenclature ever existed.
>>>
>>>
>>>
>>> On 30 September 2016 at 18:09, Joaquin Casares <
>>> joaq...@thelastpickle.com> wrote:
>>>
>>>> Hi Mehdi,
>>>>
>>>> I can help clarify a few things.
>>>>
>>>> As Carlos said, Cassandra is a Wide Column Store. Theoretically a row
>>>> can have 2 billion columns, but in practice it shouldn't have more than 100
>>>> million columns.
>>>>
>>>> Cassandra partitions data to certain nodes based on the partition
>>>> key(s), but does provide the option of setting zero or more clustering
>>>> keys. Together, the partition key(s) and clustering key(s) form the primary
>>>> key.
>>>>
>>>> When writing to Cassandra, you will need to provide the full primary
>>>> key, however, when reading from Cassandra, you only need to provide the
>>>> full partition key.
>>>>
>>>> When you only provide the partition key for a read operation, you're
>>>> able to return all columns that exist on that partition with low latency.
>>>> These columns are displayed as "CQL rows" to make it easier to reason 
>>>> about.
>>>>
>>>> Consider the schema:
>>>>
>>>> CREATE TABLE foo (
>>>>   bar uuid,
>>>>
>>>>   boz uuid,
>>>>
>>>>   baz timeuuid,
>>>>   data1 text,
>>>>
>>>>   data2 text,
>>>>
>>>>   PRIMARY KEY ((bar, boz), baz)
>>>>
>>>> );
>>>>
>>>>
>>>> When you write to Cassandra you will need to send bar, boz, and baz and
>>>> optionally data*, if it's relevant for that CQL row. If you chose not to
>>>> define a data* field for a particular CQL row, then nothing is stored nor
>>>> allocated on disk. But I wouldn't consider that caveat to be "schema-less".
>>>>
>>>> However, all writes to the same bar/boz will end up on the same
>>>> Cassandra replica set (a configurable number of nodes) and be stored on the
>>>> same place(s) on disk within the SSTable(s). And on disk, each field that's
>>>> not a partition key is stored as a column, including clustering keys (this
>>>> is optimized in Cassandra 3+, but now we're getting deep into internals).
>>>>
>>>> In this way you can get fast responses for all activity for bar/boz
>>>> either over time, or for a specific time, with roughly the same number of
>>>> disk seeks, with varying lengths on the disk scans.
>>>>
>>>> Hope that helps!
>>>>
>>>> Joaquin Casares
>>>> Consultant
>>>> Austin, TX
>>>>
>>>> Apache Cassandra Consulting
>>>> http://www.thelastpickle.com
>>>>
>>>> On Fri, Sep 30, 2016 at 11:40 AM, Carlos Alonso <i...@mrcalonso.com>
>>>> wrote:
>>>>
>>>>> Cassandra is a Wide Column Store http://db-engines.com/
>>>>> en/system/Cassandra
>>>>>
>>>>> Carlos Alonso | Software Engineer | @calonso
>>>>> <https://twitter.com/calonso>
>>>>>
>>>>> On 30 September 2016 at 18:24, Mehdi Bada <mehdi.b...@dbi-services.com
>>>>> > wrote:
>>>>>
>>>>>> Hi all,
>>>>>>
>>>>>> I have a theoritical question:
>>>>>> - Is Apache Cassandra really a column store?
>>>>>> Column store mean storing the data as column rather than as a rows.
>>>>>>
>>>>>> In fact C* store the data as row, and data is partionned with row key.
>>>>>>
>>>>>> Finally, for me, Cassandra is a row oriented schema less DBMS.... Is
>>>>>> it true for you also???
>>>>>>
>>>>>> Many thanks in advance for your reply
>>>>>>
>>>>>> Best Regards
>>>>>> Mehdi Bada
>>>>>> ----
>>>>>>
>>>>>> *Mehdi Bada* | Consultant
>>>>>> Phone: +41 32 422 96 00 | Mobile: +41 79 928 75 48 | Fax: +41 32 422
>>>>>> 96 15
>>>>>> dbi services, Rue de la Jeunesse 2, CH-2800 Delémont
>>>>>> mehdi.b...@dbi-services.com
>>>>>> www.dbi-services.com
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>> *⇒ dbi services is recruiting Oracle & SQL Server experts ! – Join
>>>>>> the team
>>>>>> <http://www.dbi-services.com/fr/dbi-services-et-ses-collaborateurs/offres-emplois-opportunites-carrieres/>*
>>>>>>
>>>>>
>>>>>
>>>>
>>>
>>

Re: Cassandra data model right definition

Reply via email to