Re: Question: Behavior of inserting a list multiple times with same timestamp

Fridtjof Sander Fri, 30 Jun 2017 02:20:13 -0700

On insert, C* creates a tombstone for the list-column with "T-1" (it'sactually -1us) as timestamp.The second insert creates that tombstone again, but that doesn't deletethe previous insert, since it's timestamp is T not T-1us:


$ INSERT INTO test.test (k , v ) VALUES ( 1 ,[10]) USING TIMESTAMP 1000;
$ flush
$ sstabledump 1.db
  {
    "partition" : {
      "key" : [ "1" ],
      "position" : 0
    },
    "rows" : [
      {
        "type" : "row",
        "position" : 48,
        "liveness_info" : { "tstamp" : "*1970-01-01T00:00:00.001Z*" },
        "cells" : [

{ "name" : "v", "deletion_info" : { "marked_deleted" :"*1970-01-01T00:00:00.000999Z*", "local_delete_time" :"2017-06-30T09:01:26Z" } },{ "name" : "v", "path" : ["*b29084c0-5d72-11e7-a054-ebe76dac4f21*" ], "value" : "1" }

        ]
      }
    ]
  }
]
$ INSERT INTO test.test (k , v ) VALUES ( 1 ,[10]) USING TIMESTAMP 1000;
$ flush
$ sstabledump 2.db
[
  {
    "partition" : {
      "key" : [ "1" ],
      "position" : 0
    },
    "rows" : [
      {
        "type" : "row",
        "position" : 48,
        "liveness_info" : { "tstamp" : "*1970-01-01T00:00:00.001Z*" },
        "cells" : [

{ "name" : "v", "deletion_info" : { "marked_deleted" :"*1970-01-01T00:00:00.000999Z*", "local_delete_time" :"2017-06-30T09:03:58Z" } },{ "name" : "v", "path" : ["*0da21e00-5d73-11e7-a054-ebe76dac4f21*" ], "value" : "1" }

(The uuids are generated based on server timestamp I believe). That'swhy you get both: On read both values are kept, since their TS is 1msand are not deleted by the 999us tombstones.

If you are using actual, incrementing timestamps the tombstones ofsubsequent inserts do delete previous values, since their timestamps arebigger.


Am 26.06.17 um 13:12 schrieb Vladimir Yudovin:

Hi,

    INSERT INTO test.test (k , v ) VALUES ( 1 ,[10]) USING TIMESTAMP 1000;
    INSERT INTO test.test (k , v ) VALUES ( 1 ,[20]) USING TIMESTAMP 1000;
    INSERT INTO test.test (k , v ) VALUES ( 1 ,[30]) USING TIMESTAMP 1000;
    SELECT * FROM test.test ;

    cqlsh> SELECT * FROM test.test ;

    k | v
    ---+-----
    1 | [3]
    // ===== WHY ?? =====

TIMESTAMP is measured as epoch_in_microseconds, so USING TIMESTAMP1000 is like insert some-when on 1970-01-01, thus it has no effectafter insert without timestamp, that got current time.

Regarding your original question: it's really look strange, may be youshould file JIRA about this.


Best regards, Vladimir Yudovin,
/Winguzone <https://winguzone.com?from=list> - Cloud Cassandra Hosting/

---- On Tue, 20 Jun 2017 10:19:08 -0400 *Thakrar, Jayesh<jthak...@conversantmedia.com>* wrote ----


    Ok, tried the test again, w/o the TIMESTAMP, and got the expected
    behavior.

    Apparently, the INSERT does replace the entire list if no
    timestamp is specified (as expected).

    However, if the TIMESTAMP is specified, then it does (what appears
    to be) an append.

    But found even more weird issue - see later below!


    ===============================================


    cqlsh> CREATE TABLE test.test (k int PRIMARY KEY , v list<int>);

    cqlsh> INSERT INTO test.test (k , v ) VALUES ( 1 ,[1]) ;

    cqlsh> INSERT INTO test.test (k , v ) VALUES ( 1 ,[1]) ;

    cqlsh> INSERT INTO test.test (k , v ) VALUES ( 1 ,[1]) ;

    cqlsh> SELECT * FROM test.test ;


    *k* | *v*

    ---+-----

    *1* | *[1]*


    (1 rows)

    cqlsh>


    ===============================================



    DROP TABLE IF EXISTS test.test ;

    CREATE TABLE test.test (k int PRIMARY KEY , v list<int>);

    INSERT INTO test.test (k , v ) VALUES ( 1 ,[1]) ;

    INSERT INTO test.test (k , v ) VALUES ( 1 ,[2]) ;

    INSERT INTO test.test (k , v ) VALUES ( 1 ,[3]) ;

    SELECT * FROM test.test ;


    cqlsh> SELECT * FROM test.test ;


    *k* | *v*

    ---+-----

    *1* | *[3]*


    // ===== EXPECTED RESULT =====


    INSERT INTO test.test (k , v ) VALUES ( 1 ,[10]) USING TIMESTAMP 1000;

    INSERT INTO test.test (k , v ) VALUES ( 1 ,[20]) USING TIMESTAMP 1000;

    INSERT INTO test.test (k , v ) VALUES ( 1 ,[30]) USING TIMESTAMP 1000;

    SELECT * FROM test.test ;


    cqlsh> SELECT * FROM test.test ;


    *k* | *v*

    ---+-----

    *1* | *[3]*


    // ===== WHY ?? =====



    DROP TABLE IF EXISTS test.test ;

    CREATE TABLE test.test (k int PRIMARY KEY , v list<int>);

    INSERT INTO test.test (k , v ) VALUES ( 1 ,[10]) USING TIMESTAMP 1000;

    INSERT INTO test.test (k , v ) VALUES ( 1 ,[20]) USING TIMESTAMP 1000;

    INSERT INTO test.test (k , v ) VALUES ( 1 ,[30]) USING TIMESTAMP 1000;

    SELECT * FROM test.test ;


    cqlsh> SELECT * FROM test.test ;


    *k* | *v*

    ---+--------------

    *1* | *[10, 20, 30]*


    // ===== WHY ?? Probably the server-timestamp-uuid playing a
    role?! =====


    INSERT INTO test.test (k , v ) VALUES ( 1 ,[1]) ;

    INSERT INTO test.test (k , v ) VALUES ( 1 ,[2]) ;

    INSERT INTO test.test (k , v ) VALUES ( 1 ,[3]) ;

    SELECT * FROM test.test ;


    cqlsh> SELECT * FROM test.test ;


    *k* | *v*

    ---+-----

    *1* | *[3]*


    // ===== EXPECTED RESULT =====






    *From: *Subroto Barua <sbarua...@yahoo.com
    <mailto:sbarua...@yahoo.com>>
    *Date: *Monday, June 19, 2017 at 11:09 PM
    *To: *"Thakrar, Jayesh" <jthak...@conversantmedia.com
    <mailto:jthak...@conversantmedia.com>>, Subroto Barua
    <sbarua...@yahoo.com.INVALID
    <mailto:sbarua...@yahoo.com.INVALID>>, Zhongxiang Zheng
    <zzh...@yahoo-corp.jp <mailto:zzh...@yahoo-corp.jp>>
    *Cc: *"user@cassandra.apache.org
    <mailto:user@cassandra.apache.org>" <user@cassandra.apache.org
    <mailto:user@cassandra.apache.org>>
    *Subject: *Re: Question: Behavior of inserting a list multiple
    times with same timestamp


    here is the response from Datastax support/dev:



    In a list each item is its own cell. Append adds a new cell sorted
    at basically "current server time uuid" prepend adds at "-current
    server time uuid". User supplied time stamps are used for the cell
    timestamp when specified.

    Inserting the entire list deletes and then inserts

    Reading reads out the entire list

    Positional access reads the entire list and gets/puts at the spot
    specified


    Basically, lists are not idempotent



    On Monday, June 19, 2017, 6:55:40 AM PDT, Thakrar, Jayesh
    <jthak...@conversantmedia.com
    <mailto:jthak...@conversantmedia.com>> wrote:



    Subroto,

    Cassandra docs say otherwise.

    Writing list data is accomplished with a JSON-style syntax. To
    write a record using INSERT, specify the entire list as a JSON
    array. Note: An INSERT will always replace the entire list.

    Maybe you can elaborate/shed some more light?

    Thanks,
    Jayesh


    Lists

    A list is a typed collection of non-unique values where elements
    are ordered by there position in the list. To create a column of
    type list, use the list keyword suffixed with the value type
    enclosed in angle brackets. For example:

    CREATE TABLE plays (
        id text PRIMARY KEY,
        game text,
        players int,
        scores list<int>
    )
    Do note that as explained below, lists have some limitations and
    performance considerations to take into account, and it is advised
    to prefer sets over lists when this is possible.

    Writing list data is accomplished with a JSON-style syntax. To
    write a record using INSERT, specify the entire list as a JSON
    array. Note: An INSERT will always replace the entire list.

    INSERT INTO plays (id, game, players, scores)
              VALUES ('123-afde', 'quake', 3, [17, 4, 2]);
    Adding (appending or prepending) values to a list can be
    accomplished by adding a new JSON-style array to an existing list
    column.

    UPDATE plays SET players = 5, scores = scores + [ 14, 21 ] WHERE
    id = '123-afde';
    UPDATE plays SET players = 5, scores = [ 12 ] + scores WHERE id =
    '123-afde';
    It should be noted that append and prepend are not idempotent
    operations. This means that if during an append or a prepend the
    operation timeout, it is not always safe to retry the operation
    (as this could result in the record appended or prepended twice).

    Lists also provides the following operation: setting an element by
    its position in the list, removing an element by its position in
    the list and remove all the occurrence of a given value in the
    list. However, and contrarily to all the other collection
    operations, these three operations induce an internal read before
    the update, and will thus typically have slower performance
    characteristics. Those operations have the following syntax:

UPDATE plays SET scores[1] = 7 WHERE id = '123-afde';// sets the 2nd element of scores to 7 (raises an error is

    scores has less than 2 elements)

DELETE scores[1] FROM plays WHERE id = '123-afde';// deletes the 2nd element of scores (raises an error is scores

    has less than 2 elements)
    UPDATE plays SET scores = scores - [ 12, 21 ] WHERE id =
    '123-afde'; // removes all occurrences of 12 and 21 from scores
    As with maps, TTLs if used only apply to the newly
    inserted/updated values.


    On 6/19/17, 1:12 AM, "Subroto Barua" <sbarua...@yahoo.com.INVALID
    <mailto:sbarua...@yahoo.com.INVALID>> wrote:

        This is an expected behavior.

        We learned this issue/feature at the current site (we use Dse
    5.08)

        Subroto

        > On Jun 18, 2017, at 10:29 PM, Zhongxiang Zheng
    <zzh...@yahoo-corp.jp <mailto:zzh...@yahoo-corp.jp>> wrote:
        >
        > Hi all,
        >
        > I have a question about a behavior when insert a list with
    specifying timestamp.
        >
        > It is documented that "An INSERT will always replace the
    entire list."
        >
    https://github.com/apache/cassandra/blob/trunk/doc/cql3/CQL.textile#lists
        >
        > However, When a list is inserted multiple times using same
    timestamp,
        > it will not be replaced, but will be added as follows.
        >
        > cqlsh> CREATE TABLE test.test (k int PRIMARY KEY , v
    list<int>);
        > cqlsh> INSERT INTO test.test (k , v ) VALUES ( 1 ,[1]) USING
    TIMESTAMP 1000 ;
        > cqlsh> INSERT INTO test.test (k , v ) VALUES ( 1 ,[1]) USING
    TIMESTAMP 1000 ;
        > cqlsh> SELECT * FROM test.test ;
        >
        > k | v
        > ---+--------
        > 1 | [1, 1]
        >
        > I confirmed this behavior is reproduced in 3.0.13 and 3.10.
        > I'd like to ask whether this behavior is a expected behavior
    or a bug?
        >
        > In our use case, CQL statements with same values and
    timestamp will be issued multiple times
        > to retry inserting under the assumption that insert is
    idempotent.
        > So, I expect that the entire list will be replace even if
    insert a list multiple times with same timestamp.
        >
        > Thanks,
        >
        > Zhongxiang
        >
        >
        >
    Ð¢ÒÒÒÒÒÒÒÒÒÒÒÒÒÒÒÒÒÒÒÒÒÒÒÒÒÒÒÒÒÒÒÒÒÒÒÒÒÒÒÒÒÒÒÒÒÒÒÒÒÒÒÒÒÒÒÒÒÒÒÒÒÒÒÒÒÒÒÒÐÐ¥Fò
    Vç7V'67&–&RÂRÖÖ–Ã¢
    W6W"×Vç7V'67&–&T676æG&æ6†Ræ÷&pÐ¤f÷"FF—F–öæÂ6öÖÖæG2ÂRÖÖ–Ã¢
    W6W"Ö†VÇ676æG&æ6†Ræ÷&pÐ






    
BKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKCB[XܚXKK[XZ[\\][XܚXP\[K\XKܙB܈Y][ۘ[[X[K[XZ[\\Z[\[K\XKܙB

Re: Question: Behavior of inserting a list multiple times with same timestamp

Reply via email to