+1 to what Ed said.
CFS is a good facilitator for running MR jobs on Cassandra to fill the HDFS
requirement (you just want to run MR, but you don't want the whole Hadoop
stack). The source data for your MR jobs should be in Cassandra KS/CFs.
On Mon, Nov 18, 2013 at 3:21 PM, Edward Capriolo wrote
CFS was written so that Brisk (now defunct) did not need a separate hadoop
HDFS stack (NN + DataNodes) to do map reduce work. It is better served as
an alternative to HDFS not as a general purpose distributed file system.
On Mon, Nov 18, 2013 at 2:02 PM, Robert Coli wrote:
> On Sat, Nov 16, 201
On Sat, Nov 16, 2013 at 9:10 PM, Willie Slepecki wrote:
> The last issue i'm dealing with before starting to write code is random
> file storage. The application will have the ability to upload whatever,
> images, pdf, etc, and i need to put them somewhere. (for the record,
> Amazon S3 is not a
Having used (and moved off of) Titan I do not recommend it as a primary
database. Until it overcomes it’s extremely unoptimized graph traversals, it
will increase the load on your database by several orders of magnitude.
As a secondary analytics database, it might do fine. Just don’t rely on
Hi all. I'm in the bar napkin phase of coming up with a big app. The
application is going to be a large graph app so I was drawn to Cassandra
because of Titan and the replication of Cassandra is far superior to Neo4j
and other open source systems I have looked at.
The last issue i'm dealing with