Allow filtering is almost never the answer, especially when you want to do a full table scan ( there might be some cases where the query is limited to a partition and allow filtering could be used). And you would like to run this query every minute - thus extremely good performance is required. Allow filtering basically brings locally in your coordinator the whole table content and performs local filtering of the data before answering your query. Performance wise is not recommended to use such an implementation.
For a query running every minute you need to address it in one partition read (according to Cassandra data modeling rules) and that can be done with denormalization ( manually or materialized views). As far as I know and also from the discussions in this list MV should be used still with caution in production environments. Thus, the best option in my opinion is manual denormalization of data, building a table with partition key last_seen and clustering key username and adding/updating data accordingly. Furthermore last_seen I understand it's a value of any time/hour of day - you could consider building partitions per day: partition key = (last_seen, day), primary key = ((last_seen,day),username)). Valentina On Mon, Oct 9, 2017 at 1:13 PM, Avi Levi <a...@indeni.com> wrote: > Hi > > I have the following table: > > CREATE TABLE users ( > username text, > last_seen bigint, > PRIMARY KEY (username) > ); > > where* last_seen* is basically the writetime . Number of records in the > table is aprox 10 million. Insert is pretty much straightforward insert > into users (username, last_seen) VALUES ([username], now) > > I want to make some processing on users that were not seen for the past > XXX (where xxx can be hours/days ... ) by query the last_seen column > (this query runs every minute) e.g : > > select username from users where last_seen < (now - 1 day). > > I have two options as I see it: > > 1. use materialized view : > > CREATE MATERIALIZED VIEW users_last_seen AS > SELECT last_seen, username > FROM users > WHERE last_seen IS NOT NULL > PRIMARY KEY (last_seen, username); > > > and simply query: > > select username from users_last_seen where last_seen < (now - 1 day) > > 1. > > query the users table > > select username from users where last_seen < (now - 1 day) ALLOW > FILTERING > > which one is more efficient? any other options ? > > Any help will be greatly appreciated > > Best > > Avi >