Re: vcdiff/bmdiff , cassandra , and the ordered partitioner…

2014-05-29 Thread Kevin Burton
The general idea is that for HTML content, you want content from the same domain to be adjacent on disk. This way duplicate HTML template runs get compressed REALLY well. I think in our situations we would see exceptional compression. If we get closer to this I'll just implement snappy+bmdiff...

Re: vcdiff/bmdiff , cassandra , and the ordered partitioner…

2014-05-29 Thread Robert Coli
On Sat, May 17, 2014 at 10:25 PM, Kevin Burton wrote: > "compression" … sure.. but bmdiff? Not that I can find. BMDiff is an > algorithm that in some situations could result in 10x compression due > to the way it's able to find long commons runs. This is a pathological > case though. But i

Re: vcdiff/bmdiff , cassandra , and the ordered partitioner…

2014-05-17 Thread Kevin Burton
"compression" … sure.. but bmdiff? Not that I can find. BMDiff is an algorithm that in some situations could result in 10x compression due to the way it's able to find long commons runs. This is a pathological case though. But if you were to copy the US constitution into itself … 10x… bm

Re: vcdiff/bmdiff , cassandra , and the ordered partitioner…

2014-05-17 Thread Colin
Cassandra offers compression out of the box. Look into the options available upon table creation. The use of orderedpartitioner is an anti-pattern 999/1000 times. It creates hot spots - the use of wide rows can often accomplish the same result through the use of clustering columns. -- Colin

vcdiff/bmdiff , cassandra , and the ordered partitioner…

2014-05-17 Thread Kevin Burton
So I see that Cassandra doesn't support bmdiff/vcdiff. Is this primarily because most people aren't using the ordered partitioner? bmdiff gets good compression by storing similar content next to each page on disk. So lots of HTML content would compress well. but if everything is being stored a