Hi together,
we Cassandra to log event data and process it every 15 minutes with
Spark. We are using the Cassandra Java Connector for Spark.
Randomly our Spark runs produce too few output records because no data
is returned from Cassandra for a several minutes window of input data.
When querying the data (with cqlsh), after multiple tries, the data
eventually becomes available.
To solve the problem, we tried to use consistency=ALL when reading the
data in Spark. We use the
CassandraJavaUtil.javafunctions().cassandraTable() method and have set
"spark.cassandra.input.consistency.level"="ALL" on the config when
creating the Spark context. The problem persists but according to
http://stackoverflow.com/a/25043599 using a consistency level of ONE on
the write side (which we use) and ALL on the READ side should be
sufficient for data consistency.
I would really appreciate if someone could give me a hint how to fix
this problem, thanks!
Greets,
Dennis
P.s.:
some information about our setup:
Cassandra 2.1.12 in a two Node configuration with replication factor=2
Spark 1.5.1
Cassandra Java Driver 2.2.0-rc3
Spark Cassandra Java Connector 2.10-1.5.0-M2