Hello. I've copied table from one keyspace into another using spark-cassandra-connector and size of a single sstable data file has ~2x difference: source Data.db file size ~ 450Mb, target ~ 200Mb. Both tables were flushed and compacted before measurement and there is only one sstable per table, compression is off. 'copy table into file.csv' produces identical csv files for both tables. Table structure is the same in both keyspaces, there is only one host in cassandra cluster, cassandra version is 2.1.1.
What can cause such a difference in sstable sizes for the same data? I expected them to be identical. nodetool cfstats src.tbl Keyspace: src Read Count: 0 Read Latency: NaN ms. Write Count: 0 Write Latency: NaN ms. Pending Flushes: 0 Table: tbl SSTable count: 1 Space used (live): 496725694 Space used (total): 496725694 Space used by snapshots (total): 346576404 SSTable Compression Ratio: 0.0 Memtable cell count: 0 Memtable data size: 0 Memtable switch count: 0 Local read count: 0 Local read latency: NaN ms Local write count: 0 Local write latency: NaN ms Pending flushes: 0 Bloom filter false positives: 0 Bloom filter false ratio: 0.00000 Bloom filter space used: 1253448 Compacted partition minimum bytes: 447 Compacted partition maximum bytes: 642 Compacted partition mean bytes: 536 Average live cells per slice (last five minutes): 0.0 Maximum live cells per slice (last five minutes): 0.0 Average tombstones per slice (last five minutes): 0.0 Maximum tombstones per slice (last five minutes): 0.0 ---------------- nodetool cfstats target.tbl Keyspace: target Read Count: 0 Read Latency: NaN ms. Write Count: 0 Write Latency: NaN ms. Pending Flushes: 0 Table: tbl SSTable count: 1 Space used (live): 224972892 Space used (total): 224972892 Space used by snapshots (total): 0 SSTable Compression Ratio: 0.0 Memtable cell count: 0 Memtable data size: 0 Memtable switch count: 0 Local read count: 0 Local read latency: NaN ms Local write count: 0 Local write latency: NaN ms Pending flushes: 0 Bloom filter false positives: 0 Bloom filter false ratio: 0.00000 Bloom filter space used: 1253448 Compacted partition minimum bytes: 180 Compacted partition maximum bytes: 215 Compacted partition mean bytes: 215 Average live cells per slice (last five minutes): 0.0 Maximum live cells per slice (last five minutes): 0.0 Average tombstones per slice (last five minutes): 0.0 Maximum tombstones per slice (last five minutes): 0.0 CREATE TABLE src.tbl ( id text PRIMARY KEY, props map<text, text> ) WITH bloom_filter_fp_chance = 0.01 AND caching = '{"keys":"ALL", "rows_per_partition":"NONE"}' AND comment = '' AND compaction = {'min_threshold': '4', 'class': 'org.apache.cassandra.db.compaction.SizeTieredCompactionStrategy', 'max_threshold': '32'} AND dclocal_read_repair_chance = 0.1 AND default_time_to_live = 0 AND gc_grace_seconds = 864000 AND max_index_interval = 2048 AND memtable_flush_period_in_ms = 0 AND min_index_interval = 128 AND read_repair_chance = 0.0 AND speculative_retry = '99.0PERCENTILE'; Regards, Anton Lebedevich.