> but based on how the rows are spread through the sstable files? It's per sstable.
Cheers ----------------- Aaron Morton Freelance Cassandra Developer New Zealand @aaronmorton http://www.thelastpickle.com On 6/03/2013, at 8:51 AM, "Hiller, Dean" <[email protected]> wrote: > Thanks for the great info, I will give it a go. > > 1 question though, my false positive rate and number of rows is not changing > so why is the bloomfilter bigger? Or do you mean bloomfilter is not based on > number of rows int he table but based on how the rows are spread through the > sstable files? > > Ie. I have the same amount of rows before and after in that specific column > family. > > > Thanks, > Dean > > From: aaron morton <[email protected]<mailto:[email protected]>> > Reply-To: "[email protected]<mailto:[email protected]>" > <[email protected]<mailto:[email protected]>> > Date: Wednesday, March 6, 2013 9:29 AM > To: "[email protected]<mailto:[email protected]>" > <[email protected]<mailto:[email protected]>> > Subject: Re: should I file a bug report on this or is this normal? > > 15. Size of nreldata is now 220K ….it has exploded in size!!!!!! > This may be explained by fragmentation in the sstables, which compaction > would eventually resolve. > > During repair the data came from multiple nodes and created multiple sstables > for each CF. Streaming copies part of an SSTable on the source and creates an > SSTable on the destination. This pattern is different to all writes for a CF > going to the same sstable when flushed. > > To compare apples to apples run a major compaction after the initial data > load, and after the repair. > > 1. Why is the bloomfilter for level 5 a total of 3856 bytes for 29118(large > to small) bytes of data while in the initial data it was 2192 bytes for > 43038(small to large) bytes of data? > The size of the BF depends on the number of rows and the false positive rate. > Not the size of the -Data.db component on disk. > > 2. Why is there 3 levels? With such a small set of data, I would think it > would flush one data file like the original data but instead there is 3 files. > See above. > > Cheers > > ----------------- > Aaron Morton > Freelance Cassandra Developer > New Zealand > > @aaronmorton > http://www.thelastpickle.com > > On 6/03/2013, at 6:40 AM, "Hiller, Dean" > <[email protected]<mailto:[email protected]>> wrote: > > I ran a pretty solid QA test(cleaned data from scratch) on version 1.2.2 > > My test was as so > > 1. Start up 4 node cassandra cluster > 2. Populate with initial test data (no other data is added to system after > this point!!!) > 3. Run nodetool drain on every node(move stuff from commit log to sstables) > 4. Stop and start cassandra cluster to have it running again > 5. Get size of nreldata CF folder is 128kB > 6. Go to node 3, run snapshot and mv snapshots directory OUT of nreldata > 7. Get size of nreldata CF folder is 128kB > 8. On node 3, run nodetool drain > 9. Get size of nreldataCF folder is still 128kB > 10. Stop cassandra node > 11. Rm <keyspace>/nreldata/*.db > 12. Size of nreldata CF is 8kb(odd of an empty folder but ok) > 13. Start cassandra > 14. Nodetool repair databus5 nreldata > 15. Size of nreldata is now 220K ….it has exploded in size!!!!!! > > I ran this QA test as we see data size explosion in production as well(I > can't be 100% sure if this is the same thing though as above is such a small > data set). Would leveled compaction be a bit more stable in terms of size > ratios and such. > > QUESTIONS > > 1. Why is the bloomfilter for level 5 a total of 3856 bytes for 29118(large > to small) bytes of data while in the initial data it was 2192 bytes for > 43038(small to large) bytes of data? > 2. Why is there 3 levels? With such a small set of data, I would think it > would flush one data file like the original data but instead there is 3 files. > > My files after repair have levels 5, 6, and 7. My files before deletion of > the CF have just level 1. After repair files are > -rw-rw-r--. 1 cassandra cassandra 54 Mar 6 07:18 > databus5-nreldata-ib-5-CompressionInfo.db > -rw-rw-r--. 1 cassandra cassandra 29118 Mar 6 07:18 > databus5-nreldata-ib-5-Data.db > -rw-rw-r--. 1 cassandra cassandra 3856 Mar 6 07:18 > databus5-nreldata-ib-5-Filter.db > -rw-rw-r--. 1 cassandra cassandra 37000 Mar 6 07:18 > databus5-nreldata-ib-5-Index.db > -rw-rw-r--. 1 cassandra cassandra 4772 Mar 6 07:18 > databus5-nreldata-ib-5-Statistics.db > -rw-rw-r--. 1 cassandra cassandra 383 Mar 6 07:18 > databus5-nreldata-ib-5-Summary.db > -rw-rw-r--. 1 cassandra cassandra 79 Mar 6 07:18 > databus5-nreldata-ib-5-TOC.txt > -rw-rw-r--. 1 cassandra cassandra 46 Mar 6 07:18 > databus5-nreldata-ib-6-CompressionInfo.db > -rw-rw-r--. 1 cassandra cassandra 14271 Mar 6 07:18 > databus5-nreldata-ib-6-Data.db > -rw-rw-r--. 1 cassandra cassandra 816 Mar 6 07:18 > databus5-nreldata-ib-6-Filter.db > -rw-rw-r--. 1 cassandra cassandra 18248 Mar 6 07:18 > databus5-nreldata-ib-6-Index.db > -rw-rw-r--. 1 cassandra cassandra 4756 Mar 6 07:18 > databus5-nreldata-ib-6-Statistics.db > -rw-rw-r--. 1 cassandra cassandra 230 Mar 6 07:18 > databus5-nreldata-ib-6-Summary.db > -rw-rw-r--. 1 cassandra cassandra 79 Mar 6 07:18 > databus5-nreldata-ib-6-TOC.txt > -rw-rw-r--. 1 cassandra cassandra 46 Mar 6 07:18 > databus5-nreldata-ib-7-CompressionInfo.db > -rw-rw-r--. 1 cassandra cassandra 14271 Mar 6 07:18 > databus5-nreldata-ib-7-Data.db > -rw-rw-r--. 1 cassandra cassandra 816 Mar 6 07:18 > databus5-nreldata-ib-7-Filter.db > -rw-rw-r--. 1 cassandra cassandra 18248 Mar 6 07:18 > databus5-nreldata-ib-7-Index.db > -rw-rw-r--. 1 cassandra cassandra 4756 Mar 6 07:18 > databus5-nreldata-ib-7-Statistics.db > -rw-rw-r--. 1 cassandra cassandra 230 Mar 6 07:18 > databus5-nreldata-ib-7-Summary.db > -rw-rw-r--. 1 cassandra cassandra 79 Mar 6 07:18 > databus5-nreldata-ib-7-TOC.txt > > Before repair files(from my moved snapshot as I moved it out of the directory > so cassandra no longer had it)…. > -rw-rw-r--. 1 cassandra cassandra 62 Mar 6 07:11 > databus5-nreldata-ib-1-CompressionInfo.db > -rw-rw-r--. 1 cassandra cassandra 43038 Mar 6 07:11 > databus5-nreldata-ib-1-Data.db > -rw-rw-r--. 1 cassandra cassandra 2192 Mar 6 07:11 > databus5-nreldata-ib-1-Filter.db > -rw-rw-r--. 1 cassandra cassandra 55248 Mar 6 07:11 > databus5-nreldata-ib-1-Index.db > -rw-rw-r--. 1 cassandra cassandra 4756 Mar 6 07:11 > databus5-nreldata-ib-1-Statistics.db > -rw-rw-r--. 1 cassandra cassandra 499 Mar 6 07:11 > databus5-nreldata-ib-1-Summary.db > -rw-rw-r--. 1 cassandra cassandra 79 Mar 6 07:11 > databus5-nreldata-ib-1-TOC.txt > > Thanks, > Dean > >
