Thanks, Yongqiang!

Could you please confirm my understanding of how to use block compression?

As of now, I am setting these properties before populating the table that should contain compressed data:
SET io.seqfile.compression.type=BLOCK;
SET hive.exec.compress.output=true;
SET mapred.output.compression.codec=org.apache.hadoop.io.compress.GzipCodec;

Question 1:
Do I need to set io.seqfile.compress.blocksize? If so, to what? It's set to 1000000 by default

Question 2:
Do I need to set hive.merge.mapfiles? If so, to what? It's set to true by default.

Question 3:
Any other options I need to set up?

Thanks again!

Mark
P.S: I am using Hive 0.7.1 with Hadoop 0.20

On 11-09-15 05:41 PM, yongqiang he wrote:
Question 1:
Indexing should work for both. But i suggest u use block compression.

Question 3 (and perhaps, the most important):
block based compression.


On Thu, Sep 15, 2011 at 2:16 PM, Mark Grover<mgro...@oanda.com>  wrote:
Hi all,
I've a question regarding compression and indexing.

I would like to compress our Hive data (presently present as SequenceFile).
Also, I have an index on this table and would like to maintain the index as
well (i.e. keep using it).

Question 1:
Sequence file compression can be block or record based. For indexing to
work, do I need to have block based compression? If both block and record
based compression can work with indexing, can someone provide insight into
which to use when?

Question 2:
BZip2 is also a block based compression and is splittable in Hadoop. Do you
see any issues with storing data in BZip2 files and using indexing on that
data?

Question 3 (and perhaps, the most important):
What are the best practices for compression (with or without indexing). Are
folks typically using Sequence File compression as compared to other
compressions (like BZip2)? If using Sequence File compression, are folks
using record based or block based?


Thank you in advance!
Mark

--
Mark Grover, Business Intelligence Analyst
OANDA Corporation

www: oanda.com www: fxtrade.com
e: mgro...@oanda.com

"Best Trading Platform" - World Finance's Forex Awards 2009.
"The One to Watch" - Treasury Today's Adam Smith Awards 2009.



--
Mark Grover, Business Intelligence Analyst
OANDA Corporation

www: oanda.com www: fxtrade.com
e: mgro...@oanda.com

"Best Trading Platform" - World Finance's Forex Awards 2009.
"The One to Watch" - Treasury Today's Adam Smith Awards 2009.

Reply via email to