The following setting is probably not a good idea: bloom_filter_fp_chance = 1.0
It would disable the bloom filters all together, and this setting doesn't have appreciably greater benefits over a setting of 0.1 (which has the advantage of saving you from disk I/O 90% of the time for keys which don't exist). See: http://www.datastax.com/docs/1.1/configuration/storage_configuration#bloom-filter-fp-chance And: http://www.datastax.com/docs/1.1/operations/tuning#tuning-bloomfilters On Thu, Jul 4, 2013 at 8:32 AM, Alain RODRIGUEZ <arodr...@gmail.com> wrote: > @Michal: all true, a clean up would certainly remove a lot of useless data > there, and I also advice Evan to do it. However, Evan may want to continue > repairing his cluster as a routine operation an there is no reason a RF > change shouldn't lead to this kind of issues. > > @Evan : With this amount of data, and not being using C*1.2, you could try > tuning your bloom filters to use less memory. Let's say disabling them the > time to recover from this issue : bloom_filter_fp_chance = 1.0 then > upgrade sstables and retry repairing. > > This depends a lot of your needs and your context, but it might work if > you can afford it. > > By the way, C* prior 1.2 should not exceed 300-500 GB per node. I read > once that C*1.2 aims to reach 3-5 TB per node. Yet, horizontal scaling, > using peer-to-peer is one of the main point of Cassandra. You might be > carefull and scale when needed to never reach that much data per node. > > As always, please experts/commiters, correct me if I am wrong. > > Alain > > > 2013/7/4 Michał Michalski <mich...@opera.com> > >> I don't think you need to run repair if you decrease RF. At least I >> wouldn't do it. >> >> In case of *decreasing* RF have 3 nodes containing some data, but only 2 >> of them should store them from now on, so you should rather run cleanup, >> instead of repair, toget rid of the data on 3rd replica. And I guess it >> should work (in terms of disk space and memory), if you've been able to >> perform compaction. >> >> Repair makes sense if you *increase* RF, so the data are streamed to the >> new replicas. >> >> M. >> >> >> W dniu 04.07.2013 12:20, Evan Dandrea pisze: >> >> Hi, >>> >>> We've made the mistake of letting our nodes get too large, now holding >>> about 3TB each. We ran out of enough free space to have a successful >>> compaction, and because we're on 1.0.7, enabling compression to get >>> out of the mess wasn't feasible. We tried adding another node, but we >>> think this may have put too much pressure on the existing ones it was >>> replicating from, so we backed out. >>> >>> So we decided to drop RF down to 2 from 3 to relieve the disk pressure >>> and started building a secondary cluster with lots of 1 TB nodes. We >>> ran repair -pr on each node, but it’s failing with a JVM OOM on one >>> node while another node is streaming from it for the final repair. >>> >>> Does anyone know what we can tune to get the cluster stable enough to >>> put it in a multi-dc setup with the secondary cluster? Do we actually >>> need to wait for these RF3->RF2 repairs to stabilize, or could we >>> point it at the secondary cluster without worry of data loss? >>> >>> We’ve set the heap on these two problematic nodes to 20GB, up from the >>> equally too high 12GB, but we’re still hitting OOM. I had seen in >>> other threads that tuning down compaction might help, so we’re trying >>> the following: >>> >>> in_memory_compaction_limit_in_**mb 32 (down from 64) >>> compaction_throughput_mb_per_**sec 8 (down from 16) >>> concurrent_compactors 2 (the nodes have 24 cores) >>> flush_largest_memtables_at 0.45 (down from 0.50) >>> stream_throughput_outbound_**megabits_per_sec 300 (down from 400) >>> reduce_cache_sizes_at 0.5 (down from 0.6) >>> reduce_cache_capacity_to 0.35 (down from 0.4) >>> >>> -XX:**CMSInitiatingOccupancyFraction**=30 >>> >>> Here’s the log from the most recent repair failure: >>> >>> http://paste.ubuntu.com/**5843017/ <http://paste.ubuntu.com/5843017/> >>> >>> The OOM starts at line 13401. >>> >>> Thanks for whatever insight you can provide. >>> >>> >> >