Re: to normalize or not to normalize - read penalty vs write penalty

Tyler Hobbs Wed, 04 Feb 2015 10:12:13 -0800

Okay.  Let's assume with denormalization you have to do 1000 writes (and
one read per user) and with normalization you have to do 1 write (and maybe
1000 reads for each user).


If you execute the writes in the most optimal way (batched by partition, if
applicable, and separate, concurrent requests per partition), I think it's
reasonable to say you can do 1000 writes in 10 to 20ms.

Doing 1000 reads is going to take longer.  Exactly how long depends on your
systems (SSDs or not, whether the data is cached, etc).  But this is
probably going to take at least 2x as long as the writes.

So, with denormalization, it's 10 to 20ms for all users to see the change
(with a median somewhere around 5 to 10ms).  With normalization, all users
*could* see the update almost immediately, because it's only one write.
However, each of your users needs to read 1000 partitions, which takes, say
20 to 50ms.  So effectively, they won't see the changes for 20 to 50ms,
unless they know to read the details for that exact alert.

On Wed, Feb 4, 2015 at 11:57 AM, Marcelo Valle (BLOOMBERG/ LONDON) <
mvallemil...@bloomberg.net> wrote:

> I don't want to optimize for reads or writes, I want to optimize for
> having the smallest gap possible between the time I write and the time I
> read.
> []s
>
> From: user@cassandra.apache.org
> Subject: Re: to normalize or not to normalize - read penalty vs write
> penalty
>
> Roughly how often do you expect to update alerts?  How often do you expect
> to read the alerts?  I suspect you'll be doing 100x more reads (or more),
> in which case optimizing for reads is the definitely right choice.
>
> On Wed, Feb 4, 2015 at 9:50 AM, Marcelo Valle (BLOOMBERG/ LONDON) <
> mvallemil...@bloomberg.net> wrote:
>
>> Hello everyone,
>>
>> I am thinking about the architecture of my application using Cassandra
>> and I am asking myself if I should or shouldn't normalize an entity.
>>
>> I have users and alerts in my application and for each user, several
>> alerts. The first model which came into my mind was creating an "alerts" CF
>> with user-id as part of the partition key. This way, I can have fast writes
>> and my reads will be fast too, as I will always read per partition.
>>
>> However, I received a requirement later that made my life more
>> complicated. Alerts can be shared by 1000s of users and alerts can change.
>> I am building a real time app and if I change an alert, all users related
>> to it should see the change.
>>
>> Suppose I want to keep thing not normalized - always an alert changes I
>> would need to do a write on 1000s of records. This way my write performance
>> everytime I change an alert would be affected.
>>
>> On the other hand, I could have a CF for users-alerts and another for
>> alert details. Then, at read time, I would need to query 1000s of alerts
>> for a given user.
>>
>> In both situations, there is a gap between the time data is written and
>> the time it's available to be read.
>>
>> I understand not normalizing will make me use more disk space, but once
>> data is written once, I will be able to perform as many reads as I want to
>> with no penalty in performance. Also, I understand writes are faster than
>> reads in Cassandra, so the gap would be smaller in the first solution.
>>
>> I would be glad in hearing thoughts from the community.
>>
>> Best regards,
>> Marcelo Valle.
>>
>
>
>
> --
> Tyler Hobbs
> DataStax <http://datastax.com/>
>
>
>


-- 
Tyler Hobbs
DataStax <http://datastax.com/>

Re: to normalize or not to normalize - read penalty vs write penalty

Reply via email to