Hi,
   I have a table A, one of the fields of which is a text column called
body.
 This text's length could vary somewhere between 120 characters to say 400
characters. The contents of this column can be the same for millions of
rows.

To prevent the repetition of the same data, I thought I will add another
table B, which stores <MD5Hash(body), body>\.

Table A {
    some fields;
    ....
    digest text,
    .....
}


TABLE B (
  digest text,
  body text,
  PRIMARY KEY (digest)
)

Whenever I insert into table A, I calculate the digest of body, and blindly
call a insert into table B also. I'm not doing any read on B. This could
result in the same <digest, body> being inserted millions of times in a
short span of time.

Couple of questions.

1) Would this cause an issue due to the number of tombstones created in a
short span of time .I'm assuming for every insert , there would be a
tombstone created for the previous record.
2) Or should I just replicate the same data in Table A itself multiple
times (with compression, space aint that big an issue ?)


- Sanjeeth

Reply via email to