OK I am getting a little confused now. Consider that I am working on a scenario where there is no limit with memory available. In such scenario, is there any advantage of storing data in HDFS in compressed format. Any advantage, like, if node 1 has data available and it is executing a particular task and node2 is free, then data needs to be transferred from node 1 to 2 write?? any network advantage or anything on storing the data in HDFS in compressed formats.
Am not talking about compression in the intermediate steps (like mapper-reducer or between mapreduce jobs), but the compression on data stored in HDFS, which needs to be decompressed for proceesing, which provides processing time overheads.