> but based on how the rows are spread through the sstable files?
It's per sstable. 

Cheers

-----------------
Aaron Morton
Freelance Cassandra Developer
New Zealand

@aaronmorton
http://www.thelastpickle.com

On 6/03/2013, at 8:51 AM, "Hiller, Dean" <[email protected]> wrote:

> Thanks for the great info, I will give it a go.
> 
> 1 question though, my false positive rate and number of rows is not changing 
> so why is the bloomfilter bigger?  Or do you mean bloomfilter is not based on 
> number of rows int he table but based on how the rows are spread through the 
> sstable files?
> 
> Ie. I have the same amount of rows before and after in that specific column 
> family.
> 
> 
> Thanks,
> Dean
> 
> From: aaron morton <[email protected]<mailto:[email protected]>>
> Reply-To: "[email protected]<mailto:[email protected]>" 
> <[email protected]<mailto:[email protected]>>
> Date: Wednesday, March 6, 2013 9:29 AM
> To: "[email protected]<mailto:[email protected]>" 
> <[email protected]<mailto:[email protected]>>
> Subject: Re: should I file a bug report on this or is this normal?
> 
> 15. Size of nreldata is now 220K ….it has exploded in size!!!!!!
> This may be explained by fragmentation in the sstables, which compaction 
> would eventually resolve.
> 
> During repair the data came from multiple nodes and created multiple sstables 
> for each CF. Streaming copies part of an SSTable on the source and creates an 
> SSTable on the destination. This pattern is different to all writes for a CF 
> going to the same sstable when flushed.
> 
> To compare apples to apples run a major compaction after the initial data 
> load, and after the repair.
> 
> 1.  Why is the bloomfilter for level 5 a total of 3856 bytes for 29118(large 
> to small) bytes of data while in the initial data it was 2192 bytes for 
> 43038(small to large) bytes of data?
> The size of the BF depends on the number of rows and the false positive rate. 
> Not the size of the -Data.db component on disk.
> 
> 2.  Why is there 3 levels?  With such a small set of data, I would think it 
> would flush one data file like the original data but instead there is 3 files.
> See above.
> 
> Cheers
> 
> -----------------
> Aaron Morton
> Freelance Cassandra Developer
> New Zealand
> 
> @aaronmorton
> http://www.thelastpickle.com
> 
> On 6/03/2013, at 6:40 AM, "Hiller, Dean" 
> <[email protected]<mailto:[email protected]>> wrote:
> 
> I ran a pretty solid QA test(cleaned data from scratch) on version 1.2.2
> 
> My test was as so
> 
> 1.  Start up 4 node cassandra cluster
> 2.  Populate with initial test data (no other data is added to system after 
> this point!!!)
> 3.  Run nodetool drain on every node(move stuff from commit log to sstables)
> 4.  Stop and start cassandra cluster to have it running again
> 5.  Get size of nreldata CF folder is 128kB
> 6.  Go to node 3, run snapshot and mv snapshots directory OUT of nreldata
> 7.  Get size of nreldata CF folder is 128kB
> 8.  On node 3, run nodetool drain
> 9.  Get size of nreldataCF folder is still 128kB
> 10. Stop cassandra node
> 11. Rm <keyspace>/nreldata/*.db
> 12. Size of nreldata CF is 8kb(odd of an empty folder but ok)
> 13. Start cassandra
> 14. Nodetool repair databus5 nreldata
> 15. Size of nreldata is now 220K ….it has exploded in size!!!!!!
> 
> I ran this QA test as we see data size explosion in production as well(I 
> can't be 100% sure if this is the same thing though as above is such a small 
> data set).  Would leveled compaction be a bit more stable in terms of size 
> ratios and such.
> 
> QUESTIONS
> 
> 1.  Why is the bloomfilter for level 5 a total of 3856 bytes for 29118(large 
> to small) bytes of data while in the initial data it was 2192 bytes for 
> 43038(small to large) bytes of data?
> 2.  Why is there 3 levels?  With such a small set of data, I would think it 
> would flush one data file like the original data but instead there is 3 files.
> 
> My files after repair have levels 5, 6, and 7.  My files before deletion of 
> the CF have just level 1.  After repair files are
> -rw-rw-r--.  1 cassandra cassandra    54 Mar  6 07:18 
> databus5-nreldata-ib-5-CompressionInfo.db
> -rw-rw-r--.  1 cassandra cassandra 29118 Mar  6 07:18 
> databus5-nreldata-ib-5-Data.db
> -rw-rw-r--.  1 cassandra cassandra  3856 Mar  6 07:18 
> databus5-nreldata-ib-5-Filter.db
> -rw-rw-r--.  1 cassandra cassandra 37000 Mar  6 07:18 
> databus5-nreldata-ib-5-Index.db
> -rw-rw-r--.  1 cassandra cassandra  4772 Mar  6 07:18 
> databus5-nreldata-ib-5-Statistics.db
> -rw-rw-r--.  1 cassandra cassandra   383 Mar  6 07:18 
> databus5-nreldata-ib-5-Summary.db
> -rw-rw-r--.  1 cassandra cassandra    79 Mar  6 07:18 
> databus5-nreldata-ib-5-TOC.txt
> -rw-rw-r--.  1 cassandra cassandra    46 Mar  6 07:18 
> databus5-nreldata-ib-6-CompressionInfo.db
> -rw-rw-r--.  1 cassandra cassandra 14271 Mar  6 07:18 
> databus5-nreldata-ib-6-Data.db
> -rw-rw-r--.  1 cassandra cassandra   816 Mar  6 07:18 
> databus5-nreldata-ib-6-Filter.db
> -rw-rw-r--.  1 cassandra cassandra 18248 Mar  6 07:18 
> databus5-nreldata-ib-6-Index.db
> -rw-rw-r--.  1 cassandra cassandra  4756 Mar  6 07:18 
> databus5-nreldata-ib-6-Statistics.db
> -rw-rw-r--.  1 cassandra cassandra   230 Mar  6 07:18 
> databus5-nreldata-ib-6-Summary.db
> -rw-rw-r--.  1 cassandra cassandra    79 Mar  6 07:18 
> databus5-nreldata-ib-6-TOC.txt
> -rw-rw-r--.  1 cassandra cassandra    46 Mar  6 07:18 
> databus5-nreldata-ib-7-CompressionInfo.db
> -rw-rw-r--.  1 cassandra cassandra 14271 Mar  6 07:18 
> databus5-nreldata-ib-7-Data.db
> -rw-rw-r--.  1 cassandra cassandra   816 Mar  6 07:18 
> databus5-nreldata-ib-7-Filter.db
> -rw-rw-r--.  1 cassandra cassandra 18248 Mar  6 07:18 
> databus5-nreldata-ib-7-Index.db
> -rw-rw-r--.  1 cassandra cassandra  4756 Mar  6 07:18 
> databus5-nreldata-ib-7-Statistics.db
> -rw-rw-r--.  1 cassandra cassandra   230 Mar  6 07:18 
> databus5-nreldata-ib-7-Summary.db
> -rw-rw-r--.  1 cassandra cassandra    79 Mar  6 07:18 
> databus5-nreldata-ib-7-TOC.txt
> 
> Before repair files(from my moved snapshot as I moved it out of the directory 
> so cassandra no longer had it)….
> -rw-rw-r--. 1 cassandra cassandra    62 Mar  6 07:11 
> databus5-nreldata-ib-1-CompressionInfo.db
> -rw-rw-r--. 1 cassandra cassandra 43038 Mar  6 07:11 
> databus5-nreldata-ib-1-Data.db
> -rw-rw-r--. 1 cassandra cassandra  2192 Mar  6 07:11 
> databus5-nreldata-ib-1-Filter.db
> -rw-rw-r--. 1 cassandra cassandra 55248 Mar  6 07:11 
> databus5-nreldata-ib-1-Index.db
> -rw-rw-r--. 1 cassandra cassandra  4756 Mar  6 07:11 
> databus5-nreldata-ib-1-Statistics.db
> -rw-rw-r--. 1 cassandra cassandra   499 Mar  6 07:11 
> databus5-nreldata-ib-1-Summary.db
> -rw-rw-r--. 1 cassandra cassandra    79 Mar  6 07:11 
> databus5-nreldata-ib-1-TOC.txt
> 
> Thanks,
> Dean
> 
> 

Reply via email to