Thank you all very much, Guo, Patrick and Jon! I will take a close look at the resources you are sharing.
On Thu, Feb 20, 2025 at 10:06 AM Patrick McFadin <pmcfa...@gmail.com> wrote: > I'll give you the general guidance around any type of storage you > pick. This even applies to local disks but it will directly apply to > your question. > > The key to success with storage and Cassandra is sequential, > concurrent IO. Most of the large IO operations are either writing and > reading a large file from disk. Sometimes, and in the harder case to > manage, at the same time. Storage systems that bias more to reads or > writes will create an imbalance that can lead to issues. And worth > emphasizing. These are sequential reads and writes. IOPs are mostly > irrelevant. The second aspect to manage is latency. Latency from disk > directly correlates to query performance. > > With respect to remote storage, they tend to have more issues with > these requirements. NFS, for example, has far too much latency and > concurrency. Just don't use it. The best thing you can do when looking > at choices is run some simple tests. Another Jon Haddad resource but > great: https://www.youtube.com/watch?v=dPpEORxoMRU You don't even need > to run Cassandra in the test. Just do some IO testing and verify that > it can read and write in a balanced manner, observe the latency and > watch for any IOWait that creeps up. > > If you have a specific technology combination, just ask here. > Collectively we have probably seen it all. > > Patrick > > On Wed, Feb 19, 2025 at 10:27 PM Long Pan <panlong...@gmail.com> wrote: > > > > Hi Cassandra Community, > > > > I’m exploring the feasibility of running Cassandra with remote storage, > primarily block storage (e.g., AWS EBS, OCI Block Volume, Google Persistent > Disk) and possibly even file storage (e.g., NFS, EFS, FSx). While local > SSDs are the typical recommendation for optimal performance, I’d like to > understand if anyone has experience or insights on using remote disks in > production. > > > > Specifically, I’m looking for guidance on: > > > > Feasibility – Has anyone successfully run Cassandra with remote storage? > If so, what use cases worked well? > > Major Downsides & Caveats – Are there any known performance bottlenecks, > consistency issues? > > Configuration Tuning – Are there any special settings (e.g., compaction, > memtable flush thresholds, disk I/O tuning) that can help mitigate > potential drawbacks? > > Monitoring & Alerting – What are the key metrics and failure scenarios > to watch out for when using remote storage? > > > > I’d appreciate any insights, war stories, or best practices from those > who have experimented with or deployed Cassandra on remote storage. > > > > Thanks, > > Long Pan >