In practice, the performance you’re getting is likely to be impacted by your reading patterns. If you do a lot of sequential reads where key1 and key2 stay the same, and only key3 varies, then you may be getting better peformance out of the second option due to hitting the row and disk caches more often. If you are doing a lot of scatter reads, then you’re likely to get better performance out of the first option, because the reads will be distributed more evenly to multiple nodes. It also depends on how large rows you’re planning to use, as this will directly impact things like compaction which has an overall impact of the entire cluster speed. For just a few values of key3, I doubt there would be much difference in performance, but if key3 has a cardinality of say, a million, you might be better off with option 1. As always the advice is - benchmark your intended use case - put a few hundred gigs of mock data to a cluster, trigger compactions and do perf tests for different kinds of read/write loads. :-) (Though if I didn’t know what my read pattern would be, I’d probably go for option 1 purely on a gut feeling if I was sure I would never need range queries on key3; shorter rows *usually* are a bit better for performance, compaction, etc. Really wide rows can sometimes be a headache operationally.) May you have energy and success! /Janne
|
- Read efficiency question Voytek Jarnot
- Re: Read efficiency question Oskar Kjellin
- Re: Read efficiency question Voytek Jarnot
- Re: Read efficiency question Oskar Kjellin
- Re: Read efficiency question Manoj Khangaonkar
- Re: Read efficiency question Janne Jalkanen
- Re: Read efficiency question Voytek Jarnot