The general idea is that for HTML content, you want content from the same
domain to be adjacent on disk.  This way duplicate HTML template runs get
compressed REALLY well.

I think in our situations we would see exceptional compression.

If we get closer to this I'll just implement snappy+bmdiff...


On Thu, May 29, 2014 at 12:34 PM, Robert Coli <rc...@eventbrite.com> wrote:

> On Sat, May 17, 2014 at 10:25 PM, Kevin Burton <bur...@spinn3r.com> wrote:
>
>> "compression" … sure.. but bmdiff? Not that I can find.  BMDiff is an
>> algorithm that in some situations could result in 100000x compression due
>> to the way it's able to find long commons runs.  This is a pathological
>> case though.  But if you were to copy the US constitution into itself
>> … 100000x… bmdiff could ideally get a 100000x compression rate.
>>
>> not all compression algorithms are identical.
>>
>
> The compression classes are pluggable. Exploratory patches are always
> welcome! :D
>
> Not sure I understand why you consider Byte Ordered Partitioner relevant,
> isn't what matters for compressibility generally the uniformity of data
> within rows in the SSTable, not the uniformity of their row keys?
>
> =Rob
>



-- 

Founder/CEO Spinn3r.com
Location: *San Francisco, CA*
Skype: *burtonator*
blog: http://burtonator.wordpress.com
… or check out my Google+ profile
<https://plus.google.com/102718274791889610666/posts>
<http://spinn3r.com>
War is peace. Freedom is slavery. Ignorance is strength. Corporations are
people.

Reply via email to