Hi all, Thanks for all for the info.
I think Nates suggestion was what I was trying to articulate in my question. Just to confirm: So if I add a timeuuid as a row level primary key and reverse the clustering, so its stored newest first. I can query it by just the partion key with a limit of 1. That way it should be able to grab the latest version of the row without hitting any tombstones. Is that correct? I don't really need paxos any more. I have been able limit it to on update thread per partition. Thanks again. Charlie On 28 May 2014 16:44, "Nate McCall" <n...@thelastpickle.com> wrote: > You could turn gc_grace_seconds down to zero and tune compaction options > for this CF to keep the tombstone count down. > > But... > > This query looks a lot like a ledger. If that is so, treat it as such and > skip the updates by: > - modifying the schema to include a timeuuid as part of a compound key > (and using that timeuuid for order) > - select the most recent via limit 1 > - would you even need paxos at this point? (please read Pat Helland's > Building on Quicksand: > http://blogs.msdn.com/cfs-file.ashx/__key/communityserver-components-postattachments/00-09-20-52-14/BuildingOnQuicksand_2D00_V3_2D00_081212h_2D00_pdf.pdfparticularly > section 6.2) > - use a TTL to keep the table tame if it's high volume > > This 'immutable' approach plays much nicer with Cassandra's strong points. > > > On Sun, May 25, 2014 at 2:01 PM, Charlie Mason <charlie....@gmail.com>wrote: > >> Hi All, >> >> I have a table which has one column per user. It revives at lot of >> updates to these columns through out the life time. They are always updates >> on a few specific columns Firstly is Cassandra storing a Tombstone for each >> of these old column values. >> >> I have run a simple select and seen the following tracing results: >> >> activity >> | timestamp | source | source_elapsed >> >> -------------------------------------------------------------------------------------------+--------------+-----------+---------------- >> execute_cql3_query | 19:48:36,582 | 127.0.0.1 | 0 >> Parsing SELECT Account, Balance FROM AccountBalances WHERE Account = >> 'test9' LIMIT 10000; | 19:48:36,582 | 127.0.0.1 | 56 >> Preparing statement | 19:48:36,582 | 127.0.0.1 | 181 >> Executing single-partition query on accountbalances | 19:48:36,583 | >> 127.0.0.1 | 878 >> Acquiring sstable references | 19:48:36,583 | 127.0.0.1 | 895 >> Merging memtable tombstones | 19:48:36,583 | 127.0.0.1 | 918 >> Key cache hit for sstable 569 | 19:48:36,583 | 127.0.0.1 | 997 >> Seeking to partition beginning in data file | 19:48:36,583 | 127.0.0.1 | >> 1034 >> Skipped 0/1 non-slice-intersecting sstables, included 0 due to tombstones >> | 19:48:36,583 | 127.0.0.1 | 1383 >> Merging data from memtables and 1 sstables | 19:48:36,583 | 127.0.0.1 | >> 1402 >> Read 1 live and 123780 tombstoned cells | 19:48:36,710 | 127.0.0.1 | >> 128631 >> Request complete | 19:48:36,711 | 127.0.0.1 | 129276 >> >> >> As you can see that's awful lot of tombstoned cells. That's after a full >> compaction as well. Just so you are aware this table is updated using a >> Paxos IF statement. >> >> Its still seems fairly snappy however I am concerned its only going to >> get worse. >> >> Would I better off adding a time based key to the primary key. Then doing >> a sepperate insert and then deleting the original. If I did the query with >> a limit of one it should always find the first rows before hitting a >> tombstone. Is that correct? >> >> Thanks, >> >> Charlie M >> >> > > > -- > ----------------- > Nate McCall > Austin, TX > @zznate > > Co-Founder & Sr. Technical Consultant > Apache Cassandra Consulting > http://www.thelastpickle.com >