Re: Performance impact with ALLOW FILTERING clause.

2019-08-17 Thread Devopam Mittra
Hi Asad, Seems to me that your development team will need to remodel the tables sooner than later. This problem can't be left unattended for long once it starts hitting severely. The way Cassandra is, you may want to have them replicate the same table with different PK / structure to suitably embed

Re: Performance impact with ALLOW FILTERING clause.

2019-08-17 Thread Alex Ott
Spark connector doesn't do the "select * from table;" - it does reads by token ranges, reading the data (see https://github.com/datastax/spark-cassandra-connector/blob/master/spark-cassandra-connector/src/main/scala/com/datastax/spark/connector/rdd/partitioner/CassandraPartition.scala#L14) Jac

Re: Performance impact with ALLOW FILTERING clause.

2019-07-26 Thread Christian Lorenz
atum: Donnerstag, 25. Juli 2019 um 20:05 An: "user@cassandra.apache.org" Betreff: RE: Performance impact with ALLOW FILTERING clause. Thank you all for your insights. When spark-connector adds allows filtering to a query, it makes the query to just ‘run’ no matter if it is expensive for lar

Re: Performance impact with ALLOW FILTERING clause.

2019-07-25 Thread Jon Haddad
luence connector not to use > allow filtering. > > > > Thanks again. > > Asad > > > > > > > > *From:* Jeff Jirsa [mailto:jji...@gmail.com] > *Sent:* Thursday, July 25, 2019 10:24 AM > *To:* cassandra > *Subject:* Re: Performance impact with ALLOW FILTERING cl

RE: Performance impact with ALLOW FILTERING clause.

2019-07-25 Thread ZAIDI, ASAD A
Jirsa [mailto:jji...@gmail.com] Sent: Thursday, July 25, 2019 10:24 AM To: cassandra Subject: Re: Performance impact with ALLOW FILTERING clause. "unpredictable" is such a loaded word. It's quite predictable, but it's often mispredicted by users. "ALLOW FILTERING"

Re: Performance impact with ALLOW FILTERING clause.

2019-07-25 Thread Jeff Jirsa
"unpredictable" is such a loaded word. It's quite predictable, but it's often mispredicted by users. "ALLOW FILTERING" basically tells the database you're going to do a query that will require scanning a bunch of data to return some subset of it, and you're not able to provide a WHERE clause that'

Re: Performance impact with ALLOW FILTERING clause.

2019-07-25 Thread Jacques-Henri Berthemet
Hi Asad, That’s because of the way Spark works. Essentially, when you execute a Spark job, it pulls the full content of the datastore (Cassandra in your case) in it RDDs and works with it “in memory”. While Spark uses “data locality” to read data from the nodes that have the required data on it