Hi Oleg,
Connectors don't deal with HA, they rely on Spark for that, so neither
the Datastax connector, Stratio Deep nor Calliope have anything to do
with Spark's HA. You should have previously configured Spark so that it
meets your high availability needs. Furthermore, as I mentioned in a
previous answer, Spark can be configured to have high availability
without the use of Mesos, you have more information in
"https://spark.apache.org/docs/latest/spark-standalone.html#high-availability"
<https://spark.apache.org/docs/latest/spark-standalone.html#high-availability>.
The three of them have similar features so all of them seem good
choices. One of the highlights of Stratio Deep is that it's able to
connect with multiple databases, not just Cassandra (currently with
Cassandra and MongoDB, more on the roadmap). Also take into account that
Stratio Deep integration with Cassandra was developed from the ground up
making no use of Hadoop at all.
On the other hand, Spark does in-memory computation but this doesn't
mean it's not able to process data that doesn't fit in memory. It will
use disk if told so, and quoting the Spark oficial faq, "Spark can
either spill it to disk or recompute the partitions that don't fit in
RAM each time they are requested. By default, it uses recomputation, but
you can set a dataset's storage level to MEMORY_AND_DISK to avoid this."
El 11/09/14 a las #4, Oleg Ruchovets escribió:
Ok.
DataStax , Startio are required mesos, hadoop yarn other third
party to get spark cluster HA.
What in case of calliope?
Is it sufficient to have cassandra + calliope + spark to be able
process aggregations?
In my case we have quite a lot of data so doing aggregation only in
memory - impossible.
Does calliope support not in memory mode for spark?
Thanks
Oleg.
On Thu, Sep 11, 2014 at 9:23 PM, abhinav chowdary
<abhinav.chowd...@gmail.com <mailto:abhinav.chowd...@gmail.com>> wrote:
Adding to conversation...
there are 3 great open source options available
1. Calliope http://tuplejump.github.io/calliope/
This is the first library that was out some time late last
year (as i can recall) and I have been using this for a while,
mostly very stable, uses Hadoop i/o in Cassandra (note that it
doesn't require hadoop)
2. Datastax spark cassandra connector
https://github.com/datastax/spark-cassandra-connector: Main
difference is this uses cql3, again a great library but has few
issues, also is very actively developed by far and still uses
thrift for minor stuff but all heavy lifting in cql3
3. Startio Deep https://github.com/Stratio/stratio-deep: Has lot
more to offer if you use all startio stack, Deep is for Spark,
Statio Streaming is built on top of spark streaming, Stratio meta
is something similar to sharkor sparksql and finally stratio
Cassandra which is a fork of Cassandra with advanced Lucene based
indexing