A lot of work in Hadoop concerns splittable compression.  Could this
be solved by offerring compression at the HDFS block (ie 64 MB) level,
just like many OS filesystems do?

http://stackoverflow.com/questions/6511255/why-cant-hadoop-split-up-a-large-text-file-and-then-compress-the-splits-using-g?rq=1
discusses this and suggests the issues is separation of concerns.
However, if the compression is done at the *HDFS block* level (with
perhaps a single flag indicating such), this would be totally
transparent to readers and writers.  This is the exact way, for
example, NTFS compression works; apps need no knowledge of the
compression.  HDFS, since it doesn't allow random reads and writes,
but only streaming, is a perfect candidate for this.

Thoughts?

---------------------------------------------------------------------
To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org

Reply via email to