On May 28, 2009, at 10:32 AM, Ian Soboroff wrote:
Brian Bockelman <[email protected]> writes:
Despite my trying, I've never been able to come even close to pegging
the CPUs on our NN.
I'd recommend going for the fastest dual-cores which are affordable
--
latency is king.
Clue?
Surely the latencies in Hadoop that dominate are not cured with faster
processors, but with more RAM and faster disks?
I've followed your posts for a while, so I know you are very
experienced
with this stuff... help me out here.
Actually, that's more of a gut feeling than informed decision.
Because the locking is rather coarse-grained, having many CPUs isn't
going to win anything -- I'd rather any CPU-related portions to go as
fast as possible. Under the highest load, I think we've been able to
get up to 25% CPU utilization: thus, I'm guessing any CPU-related
improvements will come from faster ones, not more cores.
For my cluster, if I had a lot of money, I'd spend it on a hot-spare
machine. Then, I'd spend it on upgrading the RAM, followed by disks,
followed by CPU.
Then again, for the cluster in the original email, I'd save money on
the namenode and buy more datanodes. We've got about 200 nodes and
probably have a comparable NN.
Brian