Hi all. I'm in the bar napkin phase of coming up with a big app. The application is going to be a large graph app so I was drawn to Cassandra because of Titan and the replication of Cassandra is far superior to Neo4j and other open source systems I have looked at.
The last issue i'm dealing with before starting to write code is random file storage. The application will have the ability to upload whatever, images, pdf, etc, and i need to put them somewhere. (for the record, Amazon S3 is not an option, long story) So i'm looking at a hugely expensive raid array, or an insanely complex distributed file system. Given the budget im dealing with, most likely distributed file system. Now in the past hour or so, i stumbled on CFS. And I think i know what it is, and that its not going to work for me, but I just wanted to make sure. >From what I can tell, it is a file system that does not like small files (15k images and such) because for each file you upload, its going to allocate a 2 meg block. Second, it looks like its similar to HDFS in that the FS is a misleading statement and should have probably been named CDS (Cassandra Data Store). I mean that in the sense, it wasn't designed to map a drive to and drop files in with explorer, but intended more as a convenient way to upload to your analytics engine (mapreduce or whatever) large files of structured data to have back end processes rip apart and tell you cool things you didn't know. Or for us really old guys, think of it as an easy way to dump a butt load of data into your data warehouse without having to write an ETL, and instead you write the ETL when you want to do something with it. Third, it looks like it commercial, from that stax something company. Am i wrong about any of this? Thanks -- You want it fast, cheap, or right. Pick two!!