Dear all, I understand that specifying data placement may not be in the spirit of Hadoop, however I would like to explicitly locate data input on various nodes to assert the value of data locality and cost of network transfer.
Does anyone have any advice about how I can go about the data placement? Thanks, Yipeng