OK I am getting a little confused now.

Consider that I am working on a scenario where there is no limit with
memory available.
In such scenario, is there any advantage of storing data in HDFS in
compressed format. Any advantage, like, if node 1 has data available and it
is executing a particular task and node2 is free, then data needs to be
transferred from node 1 to 2 write?? any network advantage or anything on
storing the data in HDFS in compressed formats.

Am not talking about compression in the intermediate steps (like
mapper-reducer or between mapreduce jobs), but the compression on data
stored in HDFS, which needs to be decompressed for proceesing, which
provides processing time overheads.

Reply via email to