I'm guessing something else is responsible for the compaction difference you're seeing -- Bytes, UTF8, and Ascii types all use the same lexical byte comparison code. The only place you should expect to lose a small amount of performance by using the latter two is on insert when it sanity-checks the input.
On Sat, Nov 19, 2011 at 12:43 PM, Thorsten von Eicken <t...@rightscale.com> wrote: > I recently changed the default_validation_class on a bunch of CFs from > BytesType to UTF8Type and I observed two things: first I saw a number of > compactions during the migration that showed ~200% to ~400% of original > in the log entry. Second, it seems that compaction speed has now halved. > I'm using v1.0.1, level compaction and compression. Before I create > tests I thought I'd quickly ask: is there any difference in storage > efficiency between BytesType, UTF8Type, and AsciiType when storing plain > us-ascii strings? And is there any expected compaction speed difference? > (It would be nice to have some docs about the expected storage space > used for the various data types.) > Thanks much! > Thorsten > -- Jonathan Ellis Project Chair, Apache Cassandra co-founder of DataStax, the source for professional Cassandra support http://www.datastax.com