Thanks for the information. George.
On Wed, Apr 10, 2019 at 3:14 AM Alok Dwivedi <alok.dwiv...@instaclustr.com> wrote: > Your delete query > > "DELETE FROM myTable WHERE course_id = 'C' AND assignment_id = 'A1';”. > > will generate multi row range tombstones. Since you are reading entire > partition which effectively will be read in pages (slice query equivalent) > you may get tombstones in certain pages depending upon how much deletes you > are doing. However looking at your use case I don’t think you will end with > very high ratio of deleted to live data so normal deletes should be fine as > is already pointed out below. Note that range tombstones are more effective > storage space wise as they have start/end range rather than deleted info > for every deleted row. So I also don’t think your workaround of using > ‘active’ flag is really needed unless its for auditing. Another thing to > note is if you have a use case where you want to be more aggressive in > evicting tombstones then here are some settings worth exploring > - tombstone_threshold > - unchecked_tombstone_compaction > -tombstone_compaction_interval > Additionally gc_grace_seconds can be looked at but it must be handled very > carefully as we must ensure that repair completes in an interval less than > this setting to prevent any deleted data reappearing. > > Regards > Alok > > > On 9 Apr 2019, at 15:56, Jon Haddad <j...@jonhaddad.com> wrote: > > Normal deletes are fine. > > Sadly there's a lot of hand wringing about tombstones in the generic > sense which leads people to try to work around *every* case where > they're used. This is unnecessary. A tombstone over a single row > isn't a problem, especially if you're only fetching that one row back. > Tombstones can be quite terrible under a few conditions: > > 1. When a range tombstone shadows hundreds / thousands / millions of > rows. This wasn't even detectable prior to Cassandra 3 unless you > were either looking for it specifically or were doing CPU profiling: > > http://thelastpickle.com/blog/2018/07/05/undetectable-tombstones-in-apache-cassandra.html > <https://urldefense.proofpoint.com/v2/url?u=http-3A__thelastpickle.com_blog_2018_07_05_undetectable-2Dtombstones-2Din-2Dapache-2Dcassandra.html&d=DwMFaQ&c=0YLnzTkWOdJlub_y7qAx8Q&r=rLBSjDttDAvJ4dNFETxOxW7tkSzdYwi8u8cKGod45fQ&m=P5LBsQs63k7WyPdn_oNSktA-LysOkndY9ur93lYKgwY&s=UWdZkSOzMZSzoOqC7gp1euyRNifD-RO_5fnX4y78k0c&e=> > 2. When rows were frequently created then deleted, and scanned over. > This is the queue pattern that we detest so much. > 3. When they'd be created as a side effect from over writing > collections. This is an accident typically. > > The 'active' flag is good if you want to be able to go back and look > at old deleted assignments. If you don't care about that, use a > normal delete. > > Jon > > On Tue, Apr 9, 2019 at 7:00 AM Li, George <guangxing...@pearson.com> > wrote: > > > Hi, > > I have a table defined like this: > > CREATE TABLE myTable ( > course_id text, > assignment_id text, > assignment_item_id text, > data text, > boolean active, > PRIMARY KEY (course_id, assignment_id, assignment_item_id) > ); > i.e. course_id as the partition key and assignment_id, assignment_item_id > as clustering keys. > > After data is populated, some delete queries by course_id and > assignment_id occurs, e.g. "DELETE FROM myTable WHERE course_id = 'C' AND > assignment_id = 'A1';". This would create tombstones so query "SELECT * > FROM myTable WHERE course_id = 'C';" would be affected, right? Would query > "SELECT * FROM myTable WHERE course_id = 'C' AND assignment_id = 'A2';" be > affected too? > > For query "SELECT * FROM myTable WHERE course_id = 'C';", to workaround > the tombstone problem, we are thinking about not doing hard deletes, > instead doing soft deletes. So instead of doing "DELETE FROM myTable WHERE > course_id = 'C' AND assignment_id = 'A1';", we do "UPDATE myTable SET > active = false WHERE course_id = 'C' AND assignment_id = 'A1';". Then in > the application, we do query "SELECT * FROM myTable WHERE course_id = 'C';" > and filter out records that have "active" equal to "false". I am not really > sure this would improve performance because C* still has to scan through > all records with the partition key "C". It is just instead of scanning > through X records + Y tombstone records with hard deletes that generate > tombstones, it now scans through X + Y records with soft deletes and no > tombstones. Am I right? > > Thanks. > > George > > > --------------------------------------------------------------------- > To unsubscribe, e-mail: user-unsubscr...@cassandra.apache.org > For additional commands, e-mail: user-h...@cassandra.apache.org > > >