It's not about tombstones. Tombstones are virtually markers for deleted columns (using delete or ttl) in new sstables after compaction to keep such columns for gcgrace period.
Updates do not create tombstones for previous records, latest version upon timestamp will be saved from memtable or when merged from sstables upon compaction. While data is in the memtable, latest timestamp wins, only latest version will flush to disk. Then everything depends on how fast you flush memtables and how compaction works thereafter. Do not expect any tombstones with updates, except when delete columns. Best regards / Pagarbiai Viktor Jevdokimov Senior Developer Email: viktor.jevdoki...@adform.com<mailto:viktor.jevdoki...@adform.com> Phone: +370 5 212 3063, Fax +370 5 261 0453 J. Jasinskio 16C, LT-03163 Vilnius, Lithuania Follow us on Twitter: @adforminsider<http://twitter.com/#!/adforminsider> Experience Adform DNA<http://vimeo.com/76421547> [Adform News] <http://www.adform.com> [Adform awarded the Best Employer 2012] <http://www.adform.com/site/blog/adform/adform-takes-top-spot-in-best-employer-survey/> Disclaimer: The information contained in this message and attachments is intended solely for the attention and use of the named addressee and may be confidential. If you are not the intended recipient, you are reminded that the information remains the property of the sender. You must not use, disclose, distribute, copy, print or rely on this e-mail. If you have received this message in error, please contact the sender immediately and irrevocably delete this message and any copies. From: Sanjeeth Kumar [mailto:sanje...@exotel.in] Sent: Wednesday, January 22, 2014 5:37 AM To: user@cassandra.apache.org Subject: Upserting the same values multiple times Hi, I have a table A, one of the fields of which is a text column called body. This text's length could vary somewhere between 120 characters to say 400 characters. The contents of this column can be the same for millions of rows. To prevent the repetition of the same data, I thought I will add another table B, which stores <MD5Hash(body), body>\. Table A { some fields; .... digest text, ..... } TABLE B ( digest text, body text, PRIMARY KEY (digest) ) Whenever I insert into table A, I calculate the digest of body, and blindly call a insert into table B also. I'm not doing any read on B. This could result in the same <digest, body> being inserted millions of times in a short span of time. Couple of questions. 1) Would this cause an issue due to the number of tombstones created in a short span of time .I'm assuming for every insert , there would be a tombstone created for the previous record. 2) Or should I just replicate the same data in Table A itself multiple times (with compression, space aint that big an issue ?) - Sanjeeth
<<inline: signature-logo1dfe.png>>
<<inline: signature-best-employer-logo12bc.png>>