This is assuming each row is like … I dunno 10-1000 bytes. If you’re storing like a huge 1mb blob use two tables for sure.
> On Jun 22, 2022, at 9:06 PM, Jeff Jirsa <jji...@gmail.com> wrote: > > > > Ok so here’s how I would think about this > > The writes don’t matter. (There’s a tiny tiny bit of nuance in one table > where you can contend adding to the memtable but the best cassandra engineers > on earth probably won’t notice that unless you have really super hot > partitions, so ignore the write path). > > The reads are where it changes > > In both models/cases, you’ll use the partition index to seek to where the > partition starts. > > In model 2 table 1 if you use ck+col1+… the read will load the column index > and use that to jump to within 64kb of the col1 value you specify > > In model 2 table 2, if you use ck+col3+…, same thing - column index can jump > to within 64k > > What you give up in model one is the granularity of that jump. If you use > model 1 and col3 instead of col1, the read will have to scan the partition. > In your case, with 80 rows, that may still be within that 64kb block - you > may not get more granular than that anyway. And even if it’s slightly larger, > you’re probably going to be compressing 64k chunks - maybe you have to > decompress one extra chunk on read if your 1000 rows goes past 64k, but you > likely won’t actually notice. You’re technically asking the server to read > and skip data it doesn’t need to return - it’s not really the most efficient, > but at that partition size it’s noise. You could also just return all 80-100 > rows, let the server do slightly less work and filter client side - also > valid, probably slightly worse than the server side filter. > > Having one table instead of two, though, probably saves you a ton of disk > space ($€£), and the lower disk space may also mean that data stays in page > cache, so the extra read may not even go to disk anyway. > > So with your actual data shape, I imagine you won’t really notice the nominal > inefficiency of the first model, and I’d be inclined to do that until you > demonstrate it won’t work (I bet it works fine for a long long time). > >>> On Jun 22, 2022, at 7:11 PM, MyWorld <timeplus.1...@gmail.com> wrote: >>> >> >> Hi Jeff, >> Let me know how no of rows have an impact here. >> May be today I have 80-100 rows per partition. But what if I started storing >> 2-4k rows per partition. However total partition size is still under 100 MB >> >>> On Thu, Jun 23, 2022, 7:18 AM Jeff Jirsa <jji...@gmail.com> wrote: >>> How many rows per partition in each model? >>> >>> >>> > On Jun 22, 2022, at 6:38 PM, MyWorld <timeplus.1...@gmail.com> wrote: >>> > >>> > >>> > Hi all, >>> > >>> > Just a small query around data Modelling. >>> > Suppose we have to design the data model for 2 different use cases which >>> > will query the data on same set of (partion+clustering key). So should we >>> > maintain a seperate table for each or a single table. >>> > >>> > Model1 - Combined table >>> > Table(Pk,CK, col1,col2, col3, col4,col5) >>> > >>> > Model2 - Seperate tables >>> > Table1(Pk,CK,col1,col2,col3) >>> > Table2(Pk,CK,col3,col4,col45) >>> > >>> > So here partion and clustering keys are same. Also note column col3 is >>> > required in both use cases. >>> > >>> > As per my thought in Model2, partition size would be less. There would be >>> > less sstables and when I use level compaction, it could be easily >>> > maintained. So should be better read performance. >>> > >>> > Please help me to highlight the drawback and advantage of each data >>> > model. Here we have a mix kind of workload (read/write)