Hi, I have a table A, one of the fields of which is a text column called body. This text's length could vary somewhere between 120 characters to say 400 characters. The contents of this column can be the same for millions of rows.
To prevent the repetition of the same data, I thought I will add another table B, which stores <MD5Hash(body), body>\. Table A { some fields; .... digest text, ..... } TABLE B ( digest text, body text, PRIMARY KEY (digest) ) Whenever I insert into table A, I calculate the digest of body, and blindly call a insert into table B also. I'm not doing any read on B. This could result in the same <digest, body> being inserted millions of times in a short span of time. Couple of questions. 1) Would this cause an issue due to the number of tombstones created in a short span of time .I'm assuming for every insert , there would be a tombstone created for the previous record. 2) Or should I just replicate the same data in Table A itself multiple times (with compression, space aint that big an issue ?) - Sanjeeth