Re: Duplicated key with an IN statement

Jack Krupansky Thu, 04 Feb 2016 08:03:39 -0800

Sylvain, there's a bug in CHANGES.TXT for this issue. It says: "Duplicate
rows returned when in clause has repeated values (CASSANDRA-6707)", but the
issue number is really 6706.


-- Jack Krupansky

On Thu, Feb 4, 2016 at 9:54 AM, Sylvain Lebresne <sylv...@datastax.com>
wrote:

> That behavior has been changed in 2.2 and upwards. If you don't like it,
> upgrade. In the meantime, it's probably not hard to avoid passing duplicate
> keys in IN.
>
> On Thu, Feb 4, 2016 at 3:48 PM, Edouard COLE <edouard.c...@rgsystem.com>
> wrote:
>
>> Hello,
>>
>>
>>
>> When running that kind of query with TRACING ON; I noticed the
>> coordinator is also performing multiple time the same query
>>
>>
>>
>> Because the element in the IN statement can involve many nodes, it makes
>> sense to map/reduce the query, but running multiple time the same sub query
>> should not happen. What if the result set change? Let’s imagine that query
>> : SELECT * FROM t WHERE key IN (123, 123, …. X1000, 123), and while this
>> query runs, the data for 123 change?
>>
>>
>>
>> key | value
>>
>> -----+-------
>>
>> 123 |   456
>>
>> 123 |   456
>>
>>  123 |   456
>>
>>  123 |   789 <-- Change here L
>>
>> 123 |   789
>>
>>
>>
>>
>>
>> There’s also something very important: when your table define a tuple
>> being unique for a specific key, this is a real problem to be able to have
>> a result set having multiple time the same key, which should be unique.
>> This is why on every SQL implementation, this is not happening
>>
>>
>>
>> I think this is a bug
>>
>>
>>
>> Edouard COLE
>>
>>
>>
>>
>>
>> *De :* Alain RODRIGUEZ [mailto:arodr...@gmail.com]
>> *Envoyé :* Thursday, February 04, 2016 11:55 AM
>> *À :* Edouard COLE
>> *Cc :* user@cassandra.apache.org
>> *Objet :* Re: Duplicated key with an IN statement
>>
>>
>>
>> Hi,
>>
>>
>>
>> This is interesting.
>>
>>
>>
>> It seems rational that if you are looking at 2 keys and both exist (which
>> is the case) it returns you 2 keys, it. Yet, I just checked this kind of
>> command on MySQL and it gives a one line result. So here CQL differs from
>> SQL (at least MySQL). I know we are trying to fit as much as possible with
>> SQL to avoid loosing people, so we might want to change this.
>>
>> Not sure if this behavior is intentional / known. Not even sure someone
>> ever tried to do this kind of query actually :).
>>
>>
>>
>> Does anyone know about that ? Should we raise a ticket ?
>>
>>
>>
>> -----------------
>>
>> Alain Rodriguez
>>
>> France
>>
>>
>>
>> The Last Pickle
>>
>> http://www.thelastpickle.com
>>
>>
>>
>>
>>
>>
>>
>> 2016-02-04 8:36 GMT+00:00 Edouard COLE <edouard.c...@rgsystem.com>:
>>
>> Hello,
>>
>> I just discovered this, and I think this is weird:
>>
>> ed@debian:~$ cqlsh 192.168.10.8
>> Connected to _CLUSTER_ at 192.168.10.8:9160.
>> [cqlsh 4.0.1 | Cassandra 2.0.14.459 | CQL spec 3.1.1 | Thrift protocol
>> 19.39.0]
>> Use HELP for help.
>> cqlsh> USE ks-test ;
>> cqlsh:ks-test> CREATE TABLE t (
>>             ...     key int,
>>             ...     value int,
>>             ...     PRIMARY KEY (key)
>>             ... );
>> cqlsh:ks-test> INSERT INTO t (key, value) VALUES (123, 456) ;
>> cqlsh:ks-test> SELECT * FROM t ;
>>
>>  key | value
>> -----+-------
>>  123 |   456
>>
>> (1 rows)
>>
>> cqlsh:ks-test> SELECT * FROM t WHERE key IN (123, 123);
>>
>>  key | value
>> -----+-------
>>  123 |   456
>>  123 |   456 <----- WTF?
>>
>> (2 rows)
>>
>> Adding multiple time the same key into an IN statement make the query
>> returns multiple time the tuple
>>
>> This looks weird to me, can anyone give me some feedback on such a
>> behavior?
>>
>> Edouard COLE
>>
>>
>>
>
>

Re: Duplicated key with an IN statement

Reply via email to