Static columns & when is a feature mature enough for a change to be a regression

graham sanderson Wed, 02 Jul 2014 21:02:25 -0700

There is no regression here yet, however static columns turn out to be insanely 
useful (perhaps for reasons other than originally intended), so I am curious 
what the general approach is to changes in what is essentially a very new 
feature.


In this context, I am showing something that works as intended today (at the 
bottom - hopefully in courier new), which is sufficiently interesting I think 
to be added as a regression test (assuming people agree it is valid)

That is the long and short of my question, for more context, this is what I am 
trying to do:

=====

Background: We store millions of lets say “records” in C*, and as we stream 
data through our systems we like to work with timestamped deltas to these 
records (lets say changes to a subset of fields). We do want to be able to 
recreate the data at a specific point in time (so we munge sequences of deltas 
read from cassandra together) both because historical state is a use case, but 
also because we need MVCC, so the “readers" don’t actually want to see the 
latest deltas since they may be logically in the future compared to the 
readers’ “now"

Currently, we start with a full version of the record in the first 
row(/clustering column value) for the record and add deltas forward in time. 
This is bad because you have to munge a lot of deltas to get to the state near 
now, and because the deltas and the initial state are naturally immutable, 
there is little point in using a map, since we do no map-like operations on 
them.

With a map in a static column, we can apply map operations to update the 
“latest” state, and use the rows to store immutable “backwards” deltas to take 
you from a newer state to an older state. Note now you can even TTL old deltas, 
and for MVCC of a “readers” current, you can sort by decreasing time stamp, and 
select only very few (mostly 1 perhaps) rows to get from the “latest” state to 
the “now” state from a MVCC perspective.

All this is very nice and sweet and works.

So, as I said above… I’d kind of like to use this (in something we are 
replacing anyway), but I would like anyones opinion on whether the following 
(particularly the UPDATE statements to insert rows, and make non-replacing 
updates to the static map) are entirely reasonable and something we should 
consider valid syntax.

Thanks,

Graham.

cqlsh:test> CREATE TABLE backwards_time_series ( id text, update_time bigint, 
backwards_delta_json text, latest_state map<text, text> static, PRIMARY KEY 
(id, update_time) ) WITH CLUSTERING ORDER BY (update_time DESC);
cqlsh:test> 
cqlsh:test> // initial insert of row and static column value for partition id = 
'id1';
cqlsh:test> INSERT INTO backwards_time_series (id, update_time, latest_state) 
VALUES ( 'id1', 100, { 'field1'  : 'a', 'field2' : 'b', 'field3' : 'c'});
cqlsh:test> 
cqlsh:test> // Because we are going backwards in time we generally want a lower 
bound near to now, so including that as part of the test
cqlsh:test> SELECT * from backwards_time_series WHERE id = 'id1' AND 
UPDATE_TIME >= 100;

 id  | update_time | latest_state                                  | 
backwards_delta_json
-----+-------------+-----------------------------------------------+----------------------
 id1 |         100 | {'field1': 'a', 'field2': 'b', 'field3': 'c'} |            
     null

(1 rows)

cqlsh:test> 
cqlsh:test> // Use map operation on static column to update latest state of 
field2 while inserting a new regular row with clustering key update_time = 200
cqlsh:test> UPDATE backwards_time_series SET backwards_delta_json = 
'{"field2":"b"}', latest_state['field2'] = 'newb' WHERE id='id1' AND 
update_time = 200;
cqlsh:test> 
cqlsh:test> SELECT * from backwards_time_series WHERE id = 'id1' AND 
UPDATE_TIME >= 100;

 id  | update_time | latest_state                                     | 
backwards_delta_json
-----+-------------+--------------------------------------------------+----------------------
 id1 |         200 | {'field1': 'a', 'field2': 'newb', 'field3': 'c'} |       
{"field2":"b"}
 id1 |         100 | {'field1': 'a', 'field2': 'newb', 'field3': 'c'} |         
        null

(2 rows)

cqlsh:test> 
cqlsh:test> // Same as above, but we both update a map keys value, add a new 
map key/value, and delete a new map key all in the same operation while 
inserting the new row
cqlsh:test> UPDATE backwards_time_series SET backwards_delta_json = '.. omitted 
for brevity ..', latest_state['field3'] = 'newc', latest_state['field4'] = 
'value4', latest_state['field2'] = null WHERE id='id1' and update_time = 300;
cqlsh:test> 
cqlsh:test> SELECT * from backwards_time_series WHERE id = 'id1' AND 
UPDATE_TIME >= 100;

 id  | update_time | latest_state                                          | 
backwards_delta_json
-----+-------------+-------------------------------------------------------+---------------------------
 id1 |         300 | {'field1': 'a', 'field3': 'newc', 'field4': 'value4'} | .. 
omitted for brevity ..
 id1 |         200 | {'field1': 'a', 'field3': 'newc', 'field4': 'value4'} |    
        {"field2":"b"}
 id1 |         100 | {'field1': 'a', 'field3': 'newc', 'field4': 'value4'} |    
                  null

(3 rows)

smime.p7s
Description: S/MIME cryptographic signature

Static columns & when is a feature mature enough for a change to be a regression

Reply via email to