Thank you for your time Jeff, very helpful.I couldn't find anything out there about the subject and I suspected that this could be the case.
Regarding the clustering key in this case: Back in the RDBMS world, you will always assign a sequential (or as sequential as possible) clustering key to a table to minimize fragmentation and increase the speed of the insertions. In the Cassandra world, does the same apply to the clustering key? For example, is it a good idea to assign a UUID to a clustering key, or would a timestamp be a better choice? I am thinking that partitions need to keep some sort of binary index for the clustering keys and for relatively large partitions it can be relatively expensive to maintain. F Javier Pareja On Wed, Mar 7, 2018 at 5:20 PM, Jeff Jirsa <jji...@gmail.com> wrote: > > > On Wed, Mar 7, 2018 at 7:13 AM, Carlos Rolo <r...@pythian.com> wrote: > >> Hi Jeff, >> >> Could you expand: "Tables without clustering keys are often deceptively >> expensive to compact, as a lot of work (relative to the other cell >> boundaries) happens on partition boundaries." This is something I didn't >> know and highly interesting to know more about! >> >> >> > We do a lot "by partition". We build column indexes by partition. We > update the partition index on each partition. We invalidate key cache by > partition. They're not super expensive, but they take time, and tables with > tiny partitions can actually be slower to compact. > > There's no magic cutoff where it does/doesn't make sense, my comment is > mostly a warning that the edges of the "normal" use cases tend to be less > optimized than the common case. Having a table with a hundred billion > records, where the key is numeric and the value is a single byte (let's say > you're keeping track of whether or not a specific sensor has ever detected > some magic event, and you have 100B sensors, that table will be close to > the worst-case example of this behavior). >