Hi!
We're experimenting with streaming from Hadoop to Cassandra using
BulkoutputFormat, on cassandra-1.1 branch.
Are there any specific settings we should tune on the Cassandra servers
in order to get the best streaming performance?
Our Cassandra hardware are 16 core (including HT cores) with 24GiB of
RAM. They have two disks each. So far we've configured them with
commitlog on one disk and sstables on the other, but with streaming not
using commitlog (correct?) maybe it makes sense to have sstables on both
disks, doubling available I/O?
Thoughts on number of parallel streaming clients?
Thanks,
\EF