Hi Asad,
Seems to me that your development team will need to remodel the tables
sooner than later. This problem can't be left unattended for long once it
starts hitting severely.
The way Cassandra is, you may want to have them replicate the same table
with different PK / structure to suitably embed
Spark connector doesn't do the "select * from table;" - it does reads by
token ranges, reading the data
(see
https://github.com/datastax/spark-cassandra-connector/blob/master/spark-cassandra-connector/src/main/scala/com/datastax/spark/connector/rdd/partitioner/CassandraPartition.scala#L14)
Jac
atum: Donnerstag, 25. Juli 2019 um 20:05
An: "user@cassandra.apache.org"
Betreff: RE: Performance impact with ALLOW FILTERING clause.
Thank you all for your insights.
When spark-connector adds allows filtering to a query, it makes the query to
just ‘run’ no matter if it is expensive for lar
luence connector not to use
> allow filtering.
>
>
>
> Thanks again.
>
> Asad
>
>
>
>
>
>
>
> *From:* Jeff Jirsa [mailto:jji...@gmail.com]
> *Sent:* Thursday, July 25, 2019 10:24 AM
> *To:* cassandra
> *Subject:* Re: Performance impact with ALLOW FILTERING cl
Jirsa [mailto:jji...@gmail.com]
Sent: Thursday, July 25, 2019 10:24 AM
To: cassandra
Subject: Re: Performance impact with ALLOW FILTERING clause.
"unpredictable" is such a loaded word. It's quite predictable, but it's often
mispredicted by users.
"ALLOW FILTERING"
"unpredictable" is such a loaded word. It's quite predictable, but it's
often mispredicted by users.
"ALLOW FILTERING" basically tells the database you're going to do a query
that will require scanning a bunch of data to return some subset of it, and
you're not able to provide a WHERE clause that'
Hi Asad,
That’s because of the way Spark works. Essentially, when you execute a Spark
job, it pulls the full content of the datastore (Cassandra in your case) in it
RDDs and works with it “in memory”. While Spark uses “data locality” to read
data from the nodes that have the required data on it