Most of the issues around big nodes is related to streaming, which is currently quite slow (should be a bit better in 4.0). HBase is built on top of hadoop, which is much better at large files/very dense nodes, and tends to be quite average for transactional data. ScyllaDB IDK, I'd assume they just sorted out streaming by learning from C*'s mistakes.
On 29 August 2018 at 19:43, onmstester onmstester <onmstes...@zoho.com> wrote: > Thanks Kurt, > Actually my cluster has > 10 nodes, so there is a tiny chance to stream a > complete SSTable. > While logically any Columnar noSql db like Cassandra, needs always to > re-sort grouped data for later-fast-reads and having nodes with big amount > of data (> 2 TB) would be annoying for this background process, How is it > possible that some of these databases like HBase and Scylla db does not > emphasis on small nodes (like Cassandra do)? > > Sent using Zoho Mail <https://www.zoho.com/mail/> > > > ============ Forwarded message ============ > From : kurt greaves <k...@instaclustr.com> > To : "User"<user@cassandra.apache.org> > Date : Wed, 29 Aug 2018 12:03:47 +0430 > Subject : Re: bigger data density with Cassandra 4.0? > ============ Forwarded message ============ > > My reasoning was if you have a small cluster with vnodes you're more > likely to have enough overlap between nodes that whole SSTables will be > streamed on major ops. As N gets >RF you'll have less common ranges and > thus less likely to be streaming complete SSTables. Correct me if I've > misunderstood. > > > >