Re: Best way to do a multi_get using CQL

Marcelo Elias Del Valle Thu, 19 Jun 2014 18:12:36 -0700

But using async queries wouldn't be even worse than using SELECT IN?
The justification in the docs is I could query many nodes, but I would
still do it.


Today, I use both async queries AND SELECT IN:

SELECT_ENTITY_LOOKUP = "SELECT entity_id FROM " + ENTITY_LOOKUP + " WHERE
name=%s and value in(%s)"

for name, values in identifiers.items():
   query = self.SELECT_ENTITY_LOOKUP % ('%s', ','.join(['%s']*len(values)))
   args = [name] + values
   query_msg = query % tuple(args)
   futures.append((query_msg, self.session.execute_async(query, args)))

for query_msg, future in futures:
   try:
      rows = future.result(timeout=100000)
      for row in rows:
        entity_ids.add(row.entity_id)
   except:
      logging.error("Query '%s' returned ERROR " % (query_msg))
      raise

Using async just with select = would mean instead of 1 async query
(example: in (0, 1, 2)), I would do several, one for each value of "values"
array above.
In my head, this would mean more connections to Cassandra and the same
amount of work, right? What would be the advantage?

[]s




2014-06-19 22:01 GMT-03:00 Jonathan Haddad <j...@jonhaddad.com>:

> Your other option is to fire off async queries.  It's pretty
> straightforward w/ the java or python drivers.
>
> On Thu, Jun 19, 2014 at 5:56 PM, Marcelo Elias Del Valle
> <marc...@s1mbi0se.com.br> wrote:
> > I was taking a look at Cassandra anti-patterns list:
> >
> >
> http://www.datastax.com/documentation/cassandra/2.0/cassandra/architecture/architecturePlanningAntiPatterns_c.html
> >
> > Among then is
> >
> > SELECT ... IN or index lookups¶
> >
> > SELECT ... IN and index lookups (formerly secondary indexes) should be
> > avoided except for specific scenarios. See When not to use IN in SELECT
> and
> > When not to use an index in Indexing in
> >
> > CQL for Cassandra 2.0"
> >
> > And Looking at the SELECT doc, I saw:
> >
> > When not to use IN¶
> >
> > The recommendations about when not to use an index apply to using IN in
> the
> > WHERE clause. Under most conditions, using IN in the WHERE clause is not
> > recommended. Using IN can degrade performance because usually many nodes
> > must be queried. For example, in a single, local data center cluster
> having
> > 30 nodes, a replication factor of 3, and a consistency level of
> > LOCAL_QUORUM, a single key query goes out to two nodes, but if the query
> > uses the IN condition, the number of nodes being queried are most likely
> > even higher, up to 20 nodes depending on where the keys fall in the token
> > range."
> >
> > In my system, I have a column family called "entity_lookup":
> >
> > CREATE KEYSPACE IF NOT EXISTS Identification1
> >   WITH REPLICATION = { 'class' : 'NetworkTopologyStrategy',
> >   'DC1' : 3 };
> > USE Identification1;
> >
> > CREATE TABLE IF NOT EXISTS entity_lookup (
> >   name varchar,
> >   value varchar,
> >   entity_id uuid,
> >   PRIMARY KEY ((name, value), entity_id));
> >
> > And I use the following select to query it:
> >
> > SELECT entity_id FROM entity_lookup WHERE name=%s and value in(%s)
> >
> > Is this an anti-pattern?
> >
> > If not using SELECT IN, which other way would you recomend for lookups
> like
> > that? I have several values I would like to search in cassandra and they
> > might not be in the same particion, as above.
> >
> > Is Cassandra the wrong tool for lookups like that?
> >
> > Best regards,
> > Marcelo Valle.
> >
> >
> >
> >
> >
> >
> >
> >
> >
> >
> >
>
>
>
> --
> Jon Haddad
> http://www.rustyrazorblade.com
> skype: rustyrazorblade
>

Re: Best way to do a multi_get using CQL

Reply via email to