I think if i can keep a single sstable file in a proper size, the hot data/index files may be able to fit into memory at least in some occasions.
In my use case, I want to use cassandra for storage of a large amount of log data. There will be multiple nodes, and each node has 10*2TB disks to hold as much data as possible, ideally 20TB (about 100 billion rows) in one node. Reading operations will be much less than writing. A reading latency within 1 second is acceptable. Is it possible? Do you have advice on this design? Thank you. Sheng 2011/4/3 aaron morton <aa...@thelastpickle.com> > With only one data file your reads would use the least amount of IO to find > the data. > > Most people have multiple nodes and probably fewer disks, so each node may > have a TB or two of data. How much capacity do your 10 disks give ? Will you > be running multiple nodes in production ? > > Aaron > > > > On 2 Apr 2011, at 12:45, Sheng Chen wrote: > > Thank you very much. > > The major compaction will merge everything into one big file., which would > be very large. > Is there any way to control the number or size of files created by major > compaction? > Or, is there a recommended number or size of files for cassandra to handle? > > Thanks. I see the trigger of my minor compaction is OperationsInMillions. > It is a number of operations in total, which I thought was in a second. > > Cheers, > Sheng > > > 2011/4/1 aaron morton <aa...@thelastpickle.com> > >> If you are doing some sort of bulk load you can disable minor compactions >> by setting the min_compaction_threshold and max_compaction_threshold to 0 . >> Then once your insert is complete run a major compaction via nodetool before >> turning the minor compaction back on. >> >> You can also reduce the compaction threads priority, see >> compaction_thread_priority in the yaml file. >> >> The memtable will be flushed when either the MB or ops throughput is >> triggered. If you are seeing a lot of memtables smaller than the MB >> threshold then the ops threshold is probably been triggered. Look for a log >> message at INFO level starting with "Enqueuing flush of Memtable" that will >> tell you how many bytes and ops the memtable had when it was flushed. Trying >> increasing the ops threshold and see what happens. >> >> You're change in the compaction threshold may not have an an effect >> because the compaction process was already running. >> >> AFAIK the best way to get the best out of your 10 disks will be to use a >> dedicated mirror for the commit log and a stripe set for the data. >> >> Hope that helps. >> Aaron >> >> On 1 Apr 2011, at 14:52, Sheng Chen wrote: >> >> > I've got a single node of cassandra 0.7.4, and I used the java stress >> tool to insert about 100 million records. >> > The inserts took about 6 hours (45k inserts/sec) but the following minor >> compactions last for 2 days and the pending compaction jobs are still >> increasing. >> > >> > From jconsole I can read the MemtableThroughputInMB=1499, >> MemtableOperationsInMillions=7.0 >> > But in my data directory, I got hundreds of 438MB data files, which >> should be the cause of the minor compactions. >> > >> > I tried to set compaction threshold by nodetool, but it didn't seem to >> take effects (no change in pending compaction tasks). >> > After restarting the node, my setting is lost. >> > >> > I want to distribute the read load in my disks (10 disks in xfs, LVM), >> so I don't want to do a major compaction. >> > So, what can I do to keep the sstable file in a reasonable size, or to >> make the minor compactions faster? >> > >> > Thank you in advance. >> > Sheng >> > >> >> > >