Gene, I have found that clusters used as object stores have caused me more problems than normal in the past, so I recommend using a separate object store if possible.
However, it certainly can be done, there is just a few things to consider: 1) Deletion policy: How are these objects going to be deleted, we have had problems in the past where deleted objects didn’t get removed from disk. This was because by the time they were deleted they had been compacted into very large sstables that were rarely compacted again. So think about compaction strategy and any tombstone issues you may come across. 2) Compression: Are the objects already compressed before they are stored eg jpgs ? If so turn compression off on the table, this reduces the amount of data read into memory when reading the data, reducing pressure on the heap. We did some trials with one system, and found much better performance if the compression was performed on the client side. So try some tests with that. 3) How often is the data read? There will be be completely different hardware requirements depending on whether this is a image store for an e-commerce site, compared with a pdf store holding client invoices. With a small amount of reads per object, then you can specify smaller CPUs and memory machines with a large amount of storage. If there are a large amount of reads, them you need to think much more carefully about memory and CPU, as per the Walmart article you referenced. Thanks Paul Chandler www.redshots.com > On 19 Apr 2019, at 09:04, DuyHai Doan <doanduy...@gmail.com> wrote: > > Idea: > > To guarantee data integrity, you can store an MD5 of all chunks data as > static column in the partition that contains the chunks > > On Fri, Apr 19, 2019 at 9:18 AM cclive1601你 <cclive1...@gmail.com > <mailto:cclive1...@gmail.com>> wrote: > we have use cassandra as object store for some years, you can just split the > object into some small pieces. object got a pk, then the some small pieces > got some pks ,object's pk and pieces's pk can be store in meta table in > cassandra, and small pieces's pk and some pieces store in data table. we > store videos ,picture and other no structure data. > > Gene <gh5...@gmail.com <mailto:gh5...@gmail.com>> 于2019年4月19日周五 下午1:25写道: > Howdy > > I'm looking at the possibility of using cassandra as an object store to > offload image/blob data from an Oracle database. I've seen mentions of it > being used as an object store in a large scale fashion, like with Walmart: > > https://medium.com/walmartlabs/building-object-store-storing-images-in-cassandra-walmart-scale-a6b9c02af593 > > <https://medium.com/walmartlabs/building-object-store-storing-images-in-cassandra-walmart-scale-a6b9c02af593> > > However I have found little on small scale setups and if it's even worth > using Cassandra in place of something else that's meant to be used for object > storage, like Ceph. > > Additionally, I've read that cassandra struggles with storing objects 10MB or > larger and it's recommended to break objects up into smaller chunks, which > either requires some kind of middleware between our application and > cassandra, or it would require our application to split objects into smaller > chunks and recombine them as needed. > > I've looked into pithos and astyanax, but those are both no longer developed > and I'm not seeing anything that might replace them in the long term. > > https://github.com/exoscale/pithos <https://github.com/exoscale/pithos> > https://github.com/Netflix/astyanax <https://github.com/Netflix/astyanax> > > Any helpful information or advice would be greatly appreciated. > > Thanks in advance. > > -Gene > > > -- > you are the apple of my eye !