Hello All. I am new to Cassandra and I am evaluating it for a project I am working on.
This project has several distribution models, ranging from a cloud distribution where we would be collecting hundreds of millions of rows per day to a single box distribution where we could be collecting as few as 5 to 10 million rows per day. Based on the experimentation and testing I have done so far, I believe that Cassandra would be an excellent fit for our large scale cloud distribution, but from a maintenance/support point of view, we would like to keep our storage engine consistent across all distributions. For our single box distribution, it could be running on a box as small as an i3 processor with 4 GB of RAM and about 180 GB of disk base available for use... A rough estimate would be that our storage engine could be allowed to consume about half of the processor and RAM resources. I know that running Cassandra on a single instance throws away the majority of the benefits of using a distribution storage solution (distributed writes and reads, fault tolerance, etc.), but it might be worth the trade off if we don't have to support two completely different storage solutions, even if they were hidden behind an abstraction layer from the application's point of view. My question is, are we completely out-to-lunch thinking that we might be able to run Cassandra in a reasonable way on such an under-powered box? I believe I recall reading in the Datastax documentation that the minimum recommended system requirements are 8 to 12 cores and 8 GB of RAM, which is a far cry from the lowest-end machine I'm considering. Any info or help anyone could provide would be most appreciated. Regards, Daniel Morton