Hi all.  I'm in the bar napkin phase of coming up with a big app.  The
application is going to be a large graph app so I was drawn to Cassandra
because of Titan and the replication of Cassandra is far superior to Neo4j
and other open source systems I have looked at.

The last issue i'm dealing with before starting to write code is random
file storage.  The application will have the ability to upload whatever,
images, pdf, etc, and i need to put them somewhere.  (for the record,
Amazon S3 is not an option, long story)  So i'm looking at a hugely
expensive raid array, or an insanely complex distributed file system.
 Given the budget im dealing with, most likely distributed file system.

Now in the past hour or so, i stumbled on CFS.  And I think i know what it
is, and that its not going to work for me, but I just wanted to make sure.

>From what I can tell, it is a file system that does not like small files
(15k images and such) because for each file you upload, its going to
allocate a 2 meg block.

Second, it looks like its similar to HDFS in that the FS is a misleading
statement and should have probably been named CDS (Cassandra Data Store).
 I mean that in the sense, it wasn't designed to map a drive to and drop
files in with explorer, but intended more as a convenient way to upload to
your analytics engine (mapreduce or whatever) large files of structured
data to have back end processes rip apart and tell you cool things you
didn't know.  Or for us really old guys, think of it as an easy way to dump
a butt load of data into your data warehouse without having to write an
ETL, and instead you write the ETL when you want to do something with it.

Third, it looks like it commercial, from that stax something company.

Am i wrong about any of this?

Thanks

-- 
You want it fast, cheap, or right.  Pick two!!

Reply via email to