p.s. Cassandra 1.1.4
On Thu, Sep 20, 2012 at 3:27 PM, Denis Gabaydulin <gaba...@gmail.com> wrote: > Hi, all! > > We have a cluster with virtual 7 nodes (disk storage is connected to > nodes with iSCSI). The storage schema is: > > Reports:{ > 1:{ > 1:{"value1":"some val", "value2":"some val"}, > 2:{"value1":"some val", "value2":"some val"} > ... > }, > 2:{ > 1:{"value1":"some val", "value2":"some val"}, > 2:{"value1":"some val", "value2":"some val"} > ... > } > ... > } > > create keyspace osmp_reports > with placement_strategy = 'SimpleStrategy' > and strategy_options = {replication_factor : 4} > and durable_writes = true; > > use osmp_reports; > > create column family QueryReportResult > with column_type = 'Super' > and comparator = 'BytesType' > and subcomparator = 'BytesType' > and default_validation_class = 'BytesType' > and key_validation_class = 'BytesType' > and read_repair_chance = 1.0 > and dclocal_read_repair_chance = 0.0 > and gc_grace = 432000 > and min_compaction_threshold = 4 > and max_compaction_threshold = 32 > and replicate_on_write = true > and compaction_strategy = > 'org.apache.cassandra.db.compaction.SizeTieredCompactionStrategy' > and caching = 'KEYS_ONLY'; > > ============================================= > > Read/Write CL: 2 > > Most of the reports are small, but some of them could have a half > mullion of rows (xml). Typical operations on this dataset is: > > count report rows by report_id (top level id of super column); > get columns (report_rows) by range predicate and limit for given report_id. > > A data is written once and hasn't never been updated. > > So, time to time a couple of nodes crashes with OOM exception. Heap > dump says, that we have a lot of super columns in memory. > For example, I see one of the reports is in memory entirely. How it > could be possible? If we don't load the whole report, cassandra could > whether do this for some internal reasons? > > What should we do to avoid OOMs?