We have a 32 data node Hadoop cluster that receives incoming flume data via 
three data nodes acting as flume agents. We’re using round robin DNS entries to 
spread incoming flume data from various external architectures to the three 
flume agents on those three data nodes.

It seems like historically, the three data nodes that are the flume agents 
always have many more blocks than other data nodes, so I’m wondering what the 
best approach for placement of flume agents would be within a cluster. Should 
all data nodes in the cluster be flume nodes, or should the flume agent be 
placed on a name node or other non-data node?

Thanks for any guidance.

Reply via email to