Best way to do a multi_get using CQL

Marcelo Elias Del Valle Thu, 19 Jun 2014 17:57:06 -0700

I was taking a look at Cassandra anti-patterns list:

http://www.datastax.com/documentation/cassandra/2.0/cassandra/architecture/architecturePlanningAntiPatterns_c.html


Among then is

SELECT ... IN or index lookups¶
<http://www.datastax.com/documentation/cassandra/2.0/cassandra/architecture/architecturePlanningAntiPatterns_c.html?scroll=archPlanAntiPattern__AntiPatMultiGet>

SELECT ... IN and index lookups (formerly secondary indexes) should be
avoided except for specific scenarios. See *When not to use IN* in SELECT
<http://www.datastax.com/documentation/cql/3.1/cql/cql_reference/select_r.html>
 and *When not to use an index* in Indexing
<http://www.datastax.com/documentation/cql/3.1/cql/ddl/ddl_primary_index_c.html>
 in
*CQL for Cassandra 2.0*"

And Looking at the SELECT doc, I saw:
When *not* to use IN¶
<http://www.datastax.com/documentation/cql/3.1/cql/cql_reference/select_r.html?scroll=reference_ds_d35_v2q_xj__selectInNot>
The recommendations about when not to use an index
<http://www.datastax.com/documentation/cql/3.1/cql/ddl/ddl_when_use_index_c.html>
 apply to using IN in the WHERE clause. Under most conditions, using IN in
the WHERE clause is not recommended. Using IN can degrade performance
because usually many nodes must be queried. For example, in a single, local
data center cluster having 30 nodes, a replication factor of 3, and a
consistency level of LOCAL_QUORUM, a single key query goes out to two
nodes, but if the query uses the IN condition, the number of nodes being
queried are most likely even higher, up to 20 nodes depending on where the
keys fall in the token range."

In my system, I have a column family called "entity_lookup":

CREATE KEYSPACE IF NOT EXISTS Identification1
  WITH REPLICATION = { 'class' : 'NetworkTopologyStrategy',
  'DC1' : 3 };
USE Identification1;

CREATE TABLE IF NOT EXISTS entity_lookup (
  name varchar,
  value varchar,
  entity_id uuid,
  PRIMARY KEY ((name, value), entity_id));

And I use the following select to query it:

SELECT entity_id FROM entity_lookup WHERE name=%s and value in(%s)

Is this an anti-pattern?

If not using SELECT IN, which other way would you recomend for lookups like
that? I have several values I would like to search in cassandra and they
might not be in the same particion, as above.

Is Cassandra the wrong tool for lookups like that?

Best regards,
Marcelo Valle.

Best way to do a multi_get using CQL

Reply via email to