Perfect Tyler. My feeling was leading me to this, but I wasn't being able to put it in words as you did.
Thanks a lot for the message. From: user@cassandra.apache.org Subject: Re: to normalize or not to normalize - read penalty vs write penalty Okay. Let's assume with denormalization you have to do 1000 writes (and one read per user) and with normalization you have to do 1 write (and maybe 1000 reads for each user). If you execute the writes in the most optimal way (batched by partition, if applicable, and separate, concurrent requests per partition), I think it's reasonable to say you can do 1000 writes in 10 to 20ms. Doing 1000 reads is going to take longer. Exactly how long depends on your systems (SSDs or not, whether the data is cached, etc). But this is probably going to take at least 2x as long as the writes. So, with denormalization, it's 10 to 20ms for all users to see the change (with a median somewhere around 5 to 10ms). With normalization, all users *could* see the update almost immediately, because it's only one write. However, each of your users needs to read 1000 partitions, which takes, say 20 to 50ms. So effectively, they won't see the changes for 20 to 50ms, unless they know to read the details for that exact alert. On Wed, Feb 4, 2015 at 11:57 AM, Marcelo Valle (BLOOMBERG/ LONDON) <mvallemil...@bloomberg.net> wrote: I don't want to optimize for reads or writes, I want to optimize for having the smallest gap possible between the time I write and the time I read. []s From: user@cassandra.apache.org Subject: Re: to normalize or not to normalize - read penalty vs write penalty Roughly how often do you expect to update alerts? How often do you expect to read the alerts? I suspect you'll be doing 100x more reads (or more), in which case optimizing for reads is the definitely right choice. On Wed, Feb 4, 2015 at 9:50 AM, Marcelo Valle (BLOOMBERG/ LONDON) <mvallemil...@bloomberg.net> wrote: Hello everyone, I am thinking about the architecture of my application using Cassandra and I am asking myself if I should or shouldn't normalize an entity. I have users and alerts in my application and for each user, several alerts. The first model which came into my mind was creating an "alerts" CF with user-id as part of the partition key. This way, I can have fast writes and my reads will be fast too, as I will always read per partition. However, I received a requirement later that made my life more complicated. Alerts can be shared by 1000s of users and alerts can change. I am building a real time app and if I change an alert, all users related to it should see the change. Suppose I want to keep thing not normalized - always an alert changes I would need to do a write on 1000s of records. This way my write performance everytime I change an alert would be affected. On the other hand, I could have a CF for users-alerts and another for alert details. Then, at read time, I would need to query 1000s of alerts for a given user. In both situations, there is a gap between the time data is written and the time it's available to be read. I understand not normalizing will make me use more disk space, but once data is written once, I will be able to perform as many reads as I want to with no penalty in performance. Also, I understand writes are faster than reads in Cassandra, so the gap would be smaller in the first solution. I would be glad in hearing thoughts from the community. Best regards, Marcelo Valle. -- Tyler Hobbs DataStax -- Tyler Hobbs DataStax