Perfect Tyler.

My feeling was leading me to this, but I wasn't being able to put it in words 
as you did. 

Thanks a lot for the message.


From: user@cassandra.apache.org 
Subject: Re: to normalize or not to normalize - read penalty vs write penalty

Okay.  Let's assume with denormalization you have to do 1000 writes (and one 
read per user) and with normalization you have to do 1 write (and maybe 1000 
reads for each user).

If you execute the writes in the most optimal way (batched by partition, if 
applicable, and separate, concurrent requests per partition), I think it's 
reasonable to say you can do 1000 writes in 10 to 20ms.

Doing 1000 reads is going to take longer.  Exactly how long depends on your 
systems (SSDs or not, whether the data is cached, etc).  But this is probably 
going to take at least 2x as long as the writes. 

So, with denormalization, it's 10 to 20ms for all users to see the change (with 
a median somewhere around 5 to 10ms).  With normalization, all users *could* 
see the update almost immediately, because it's only one write.  However, each 
of your users needs to read 1000 partitions, which takes, say 20 to 50ms.  So 
effectively, they won't see the changes for 20 to 50ms, unless they know to 
read the details for that exact alert.

On Wed, Feb 4, 2015 at 11:57 AM, Marcelo Valle (BLOOMBERG/ LONDON) 
<mvallemil...@bloomberg.net> wrote:

I don't want to optimize for reads or writes, I want to optimize for having the 
smallest gap possible between the time I write and the time I read.
[]s

From: user@cassandra.apache.org 
Subject: Re: to normalize or not to normalize - read penalty vs write penalty

Roughly how often do you expect to update alerts?  How often do you expect to 
read the alerts?  I suspect you'll be doing 100x more reads (or more), in which 
case optimizing for reads is the definitely right choice.

On Wed, Feb 4, 2015 at 9:50 AM, Marcelo Valle (BLOOMBERG/ LONDON) 
<mvallemil...@bloomberg.net> wrote:

Hello everyone,

I am thinking about the architecture of my application using Cassandra and I am 
asking myself if I should or shouldn't normalize an entity.

I have users and alerts in my application and for each user, several alerts. 
The first model which came into my mind was creating an "alerts" CF with 
user-id as part of the partition key. This way, I can have fast writes and my 
reads will be fast too, as I will always read per partition.

However, I received a requirement later that made my life more complicated. 
Alerts can be shared by 1000s of users and alerts can change. I am building a 
real time app and if I change an alert, all users related to it should see the 
change. 

Suppose I want to keep thing not normalized - always an alert changes I would 
need to do a write on 1000s of records. This way my write performance everytime 
I change an alert would be affected. 

On the other hand, I could have a CF for users-alerts and another for alert 
details. Then, at read time, I would need to query 1000s of alerts for a given 
user.

In both situations, there is a gap between the time data is written and the 
time it's available to be read. 

I understand not normalizing will make me use more disk space, but once data is 
written once, I will be able to perform as many reads as I want to with no 
penalty in performance. Also, I understand writes are faster than reads in 
Cassandra, so the gap would be smaller in the first solution.

I would be glad in hearing thoughts from the community.

Best regards,
Marcelo Valle.


-- 
Tyler Hobbs
DataStax


-- 
Tyler Hobbs
DataStax


Reply via email to