On Aug 23, 2011, at 2:25 AM, Peter Schuller wrote: >> We've been having issues where as soon as we start doing heavy writes (via >> hadoop) recently, it really hammers 4 nodes out of 20. We're using random >> partitioner and we've set the initial tokens for our 20 nodes according to >> the general spacing formula, except for a few token offsets as we've >> replaced dead nodes. > > Is the hadoop job iterating over keys in the cluster in token order > perhaps, and you're generating writes to those keys? That would > explain a "moving hotspot" along the cluster.
Yes - we're iterating over all the keys of particular column families, doing joins using pig as we enrich and perform measure calculations. When we write, we're usually writing out for a certain small subset of keys which shouldn't have hotspots with RandomPartitioner afaict. > > -- > / Peter Schuller (@scode on twitter)