Ah, makes sense. Thanks for the explanations! - Ian
On Tue, Dec 16, 2014 at 10:53 AM, Robert Wille <rwi...@fold3.com> wrote: > > Tombstones have to be created. The SSTables are immutable, so the data > cannot be deleted. Therefore, a tombstone is required. The value you > deleted will be physically removed during compaction. > > My workload sounds similar to yours in some respects, and I was able to > get C* working for me. I have large chunks of data which I periodically > replace. I write the new data, update a reference, and then delete the old > data. I designed my schema to be tombstone-friendly, and C* works great. > For some of my tables I am able to delete entire partitions. Because of the > reference that I updated, I never try to access the old data, and therefore > the tombstones for these partitions are never read. The old data simply has > to wait for compaction. Other tables require deleting records within > partitions. These tombstones do get read, so there are performance > implications. I was able to design my schema so that no partition ever has > more than a few tombstones (one for each generation of deleted data, which > is usually no more than one). > > Hope this helps. > > Robert > > On Dec 16, 2014, at 8:22 AM, Ian Rose <ianr...@fullstory.com> wrote: > > Howdy all, > > Our use of cassandra unfortunately makes use of lots of deletes. Yes, I > know that C* is not well suited to this kind of workload, but that's where > we are, and before I go looking for an entirely new data layer I would > rather explore whether C* could be tuned to work well for us. > > However, deletions are never driven by users in our app - deletions > always occur by backend processes to "clean up" data after it has been > processed, and thus they do not need to be 100% available. So this made me > think, what if I did the following? > > - gc_grace_seconds = 0, which ensures that tombstones are never created > - replication factor = 3 > - for writes that are inserts, consistency = QUORUM, which ensures > that writes can proceed even if 1 replica is slow/down > - for deletes, consistency = ALL, which ensures that when we delete a > record it disappears entirely (no need for tombstones) > - for reads, consistency = QUORUM > > Also, I should clarify that our data essentially append only, so I don't > need to worry about inconsistencies created by partial updates (e.g. value > gets changed on one machine but not another). Sometimes there will be > duplicate writes, but I think that should be fine since the value is always > identical. > > Any red flags with this approach? Has anyone tried it and have > experiences to share? Also, I *think* that this means that I don't need to > run repairs, which from an ops perspective is great. > > Thanks, as always, > - Ian > > >