I'm not seeing any documentation link in Sanjay's message, so here it is again (in the Hive wiki's language manual): https://cwiki.apache.org/confluence/display/Hive/LanguageManual+LZO.
On Thu, Aug 8, 2013 at 3:30 PM, Sanjay Subramanian < sanjay.subraman...@wizecommerce.com> wrote: > Please refer this documentation here > Let me know if u need more clarifications so that we can make this > document better and complete > > Thanks > > sanjay > > From: w00t w00t <w00...@yahoo.de> > Reply-To: "user@hive.apache.org" <user@hive.apache.org>, w00t w00t < > w00...@yahoo.de> > Date: Thursday, August 8, 2013 2:02 AM > To: "user@hive.apache.org" <user@hive.apache.org> > Subject: Hive and Lzo Compression > > > Hello, > > I am started to run Hive with Lzo compression on Hortonworks 1.2 > > I have managed to install/configure Lzo and hive -e "set > io.compression.codecs" shows me the Lzo Codecs: > io.compression.codecs= > org.apache.hadoop.io.compress.GzipCodec, > org.apache.hadoop.io.compress.DefaultCodec, > com.hadoop.compression.lzo.LzoCodec, > com.hadoop.compression.lzo.LzopCodec, > org.apache.hadoop.io.compress.BZip2Codec > > However, I have some questions where I would be happy if you could help me. > > (1) CREATE TABLE statement > > I read in different postings, that in the CREATE TABLE statement, I have > to use the following STORAGE clause: > > CREATE EXTERNAL TABLE txt_table_lzo ( > txt_line STRING > ) > ROW FORMAT DELIMITED FIELDS TERMINATED BY '||||' > STORED AS INPUTFORMAT 'com.hadoop.mapred.DeprecatedLzoTextInputFormat' > OUTPUTFORMAT 'org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat' > LOCATION '/user/myuser/data/in/lzo_compressed'; > > It works withouth any problems now to execute SELECT statements on this > table with Lzo data. > > However I also created a table on the same data without this STORAGE > clause: > > CREATE EXTERNAL TABLE txt_table_lzo_tst ( > txt_line STRING > ) > ROW FORMAT DELIMITED FIELDS TERMINATED BY '||||' > LOCATION '/user/myuser/data/in/lzo_compressed'; > > The interesting thing is, it works as well, when I execute a SELECT > statement and this table. > > Can you help, why the second CREATE TABLE statement works as well? > What should I use in DDLs? > Is it best practice to use the STORED AS clause with a > "deprecatedLzoTextInputFormat"? Or should I remove it? > > > (2) Output and Intermediate Compression Settings > > I want to use output compression . > > In "Programming Hive" from Capriolo, Wampler, Rutherglen the following > commands are recommended: > SET hive.exec.compress.output=true; > SET mapred.output.compression.codec=com.hadoop.compression.lzo.LzopCodec; > > However, in some other places in forums, I found the following > recommended settings: > SET hive.exec.compress.output=true > SET mapreduce.output.fileoutputformat.compress=true > SET > mapreduce.output.fileoutputformat.compress.codec=com.hadoop.compression.lzo.LzopCodec > > Am I right, that the first settings are for Hadoop versions prior 0.23? > Or is there any other reason why the settings are different? > > I am using Hadoop 1.1.2 with Hive 0.10.0. > Which settings would you recommend to use? > > -------------- > I also want to compress intermediate results. > > Again, in "Programming Hive" the following settings are > recommended: > SET hive.exec.compress.intermediate=true; > SET > mapred.map.output.compression.codec=com.hadoop.compression.lzo.LzopCodec; > > Is this the right setting? > > Or should I again use the settings (which look more valid for > Hadoop 0.23 and greater)?: > SET hive.exec.compress.intermediate=true; > SET > mapreduce.map.output.compression.codec=com.hadoop.compression.lzo.LzopCodec; > > Thanks > > > > > CONFIDENTIALITY NOTICE > ====================== > This email message and any attachments are for the exclusive use of the > intended recipient(s) and may contain confidential and privileged > information. Any unauthorized review, use, disclosure or distribution is > prohibited. If you are not the intended recipient, please contact the > sender by reply email and destroy all copies of the original message along > with any attachments, from your computer system. If you are the intended > recipient, please be advised that the content of this message is subject to > access, review and disclosure by the sender's Email System Administrator. > -- Lefty