Hello, Edward, Thank you for giving me insight about large disk nodes.
From: "Edward Capriolo" <edlinuxg...@gmail.com> > Index sampling on start up. If you have very small rows your indexes > become large. These have to be sampled on start up and sampling our > indexes for 300Gb of data can take 5 minutes. This is going to be > optimized soon. 5 minutes for 300 GB data ... it's not cheap, is it? Simply, 3 TB of data will leat to 50 minutes just for computing input splits. This is too expensive when I want only part of the 3 TB data. > (Just wanted to note some of this as I am in the middle of a process > of joining a node now :) Good luck. I'd appreciate if you could some performance numbers of joining nodes (amount of data, time to distribute data, load impact on applications, etc) if you can. The cluster our customer is thinking of is likely to become very large, so I'm interested in the elasticity. Yahoo!'s YCSB report makes me worry about adding nodes. Regards, Takayuki Tsunakawa From: "Edward Capriolo" <edlinuxg...@gmail.com> [Q3] There are some challenges with very large disk nodes. Caveats: I will use words like "long", "slow", and "large" relatively. If you have great equipment IE. 10G Ethernet between nodes it will not take "long" to transfer data. If you have an insane disk pack it may not take "long" to compact 200GB of data. I am basing these statements on server class hardware. ~32 GB ram ~2x processor, ~6 disk SAS RAID. Index sampling on start up. If you have very small rows your indexes become large. These have to be sampled on start up and sampling our indexes for 300Gb of data can take 5 minutes. This is going to be optimized soon. Joining nodes: When you go with larger systems joining a new node involves a lot of transfer, and can take a "long" time. Node join process is going to be optimized in 0.7 and 0.8 (quite drastic changes in 0.7) Major compaction and very large normal compaction can take a "long" time. For example while doing a 200 GB compaction that takes 30 minutes, other sstables build up, more sstables mean "slower" reads. Achieving a high RAM/DISK ratio may be easier with smaller nodes vs one big node with 128 GB RAM $$$. As Jonathan pointed out nothing technically is stopping larger disk nodes. (Just wanted to note some of this as I am in the middle of a process of joining a node now :)