Re: Using 5-6 bytes for cassandra timestamps vs 8…

Maxim Potekhin Wed, 18 Jan 2012 23:57:12 -0800

I must have accidentally deleted all messages in this thread save this one.

On the face value, we are talking about saving 2 bytes per column. Iknow it can add up with many columns, but relative to the size of thecolumn -- is it THAT significant?

I made an effort to minimize my CF footprint by replacing the "natural"column keys with integers (and translating back and forth when writingand reading). It's easy to see that in my case I achieve almost 50%storage savings and at least 30%. But if the column in question containsmore than 20 bytes -- what's up with trying to save 2?


Cheers

Maxim


On 1/18/2012 11:49 PM, Ertio Lew wrote:

I believe the timestamps *on per column basis* are only required until
the compaction time after that it may also work if the timestamp range
could be specified globally on per SST table basis. and thus the
timestamps until compaction are only required to be measure the time
from the initialization of the new memtable to the point the column is
written to that memtable. Thus you can easily fit that time in 4
bytes. This I believe would save atleast  4 bytes overhead for each
column.

Is anything related to these overheads under consideration/ or planned
in the roadmap ?



On Tue, Sep 6, 2011 at 11:44 AM, Oleg Anastastasyev<olega...@gmail.com>  wrote:

I have a patch for trunk which I just have to get time to test a bit before I

submit.

It is for super columns and will use the super columns timestamp as the base

and only store variant encoded offsets in the underlying columns.
Could you please measure how much real benefit it brings (in real RAM
consumption by JVM). It is hard to tell will it give noticeable results or not.
AFAIK memory structures used for memtable consume much more memory. And 64-bit
JVM allocates memory aligned to 64-bit word boundary. So 37% of memory
consumption reduction looks doubtful.

Re: Using 5-6 bytes for cassandra timestamps vs 8…

Reply via email to