with 50 bln rows and bloom_filter_fp_chance = 0.01, bloom filter will consume a lot of off heap memory. You may want to take that into consideration too.
On Wed, May 18, 2016 at 11:53 PM, Adarsh Kumar <adarsh0...@gmail.com> wrote: > Hi Sai, > > We have a use case where we are designing a table that is going to have > around 50 billion rows and we require a very fast reads. Partitions are not > that complex/big, it has > some validation data for duplicate checks (consisting 4-5 int and > varchar). So we were trying various options to optimize read performance. > Apart from tuning Bloom Filter we are trying following thing: > > 1). Better data modelling (making appropriate partition and clustering > keys) > 2). Trying Leveled compaction (changing data model for this one) > > Jonathan, > > I understand that tuning bloom_filter_fp_chance will not have a drastic > performance gain. > But this is one of the many tings we are trying. > Please let me know if you have any other suggestions to improve read > performance for this volume of data. > > Also please let me know any performance benchmark technique (currently we > are planing to trigger massive reads from spark and check cfstats). > > NOTE: we will be deploying DSE on EC2, so please suggest if you have > anything specific to DSE and EC2. > > Adarsh > > On Wed, May 18, 2016 at 9:45 PM, Jonathan Haddad <j...@jonhaddad.com> > wrote: > >> The impact is it'll get massively bigger with very little performance >> benefit, if any. >> >> You can't get 0 because it's a probabilistic data structure. It tells >> you either: >> >> your data is definitely not here >> your data has a pretty decent chance of being here >> >> but never "it's here for sure" >> >> https://en.wikipedia.org/wiki/Bloom_filter >> >> On Wed, May 18, 2016 at 11:04 AM sai krishnam raju potturi < >> pskraj...@gmail.com> wrote: >> >>> hi Adarsh; >>> were there any drawbacks to setting the bloom_filter_fp_chance to >>> the default value? >>> >>> thanks >>> Sai >>> >>> On Wed, May 18, 2016 at 2:21 AM, Adarsh Kumar <adarsh0...@gmail.com> >>> wrote: >>> >>>> Hi, >>>> >>>> What is the impact of setting bloom_filter_fp_chance < 0.01. >>>> >>>> During performance tuning I was trying to tune bloom_filter_fp_chance >>>> and have following questions: >>>> >>>> 1). Why bloom_filter_fp_chance = 0 is not allowed. ( >>>> https://issues.apache.org/jira/browse/CASSANDRA-5013) >>>> 2). What is the maximum/recommended value of bloom_filter_fp_chance (if >>>> we do not have any limitation for bloom filter size). >>>> >>>> NOTE: We are using default SizeTieredCompactionStrategy on >>>> cassandra 2.1.8.621 >>>> >>>> Thanks in advance..:) >>>> >>>> Adarsh Kumar >>>> >>> >>> >