As per my understanding, Underlying hadoop framework identifies if the files are compressed or not in a transparent manner. If they are compressed then the framework does take care of the decompression part when the compression codecs are available .
On Thu, Aug 15, 2013 at 4:20 AM, Sanjay Subramanian < sanjay.subraman...@wizecommerce.com> wrote: > I am not sure if in this cade data is loaded > OR partition added with location specified (to some location in HDFS) > > Yes u r stating the question correctly > > sanjay > > From: Nitin Pawar <nitinpawar...@gmail.com> > Reply-To: "user@hive.apache.org" <user@hive.apache.org> > Date: Wednesday, August 14, 2013 10:54 AM > > To: "user@hive.apache.org" <user@hive.apache.org> > Subject: Re: Hive and Lzo Compression > > Please correct me if I understood the question correctly > > You created a table def without mentioning a stored as clause > then you load data into table from a compressed a file > then do a select query and it still works > but how did it figured out which compression codec to use? > > Am I stating it correctly ? > > > > On Wed, Aug 14, 2013 at 11:11 PM, Sanjay Subramanian < > sanjay.subraman...@wizecommerce.com> wrote: > >> That is really interesting…let me try and think of a reason…meanwhile >> any other LZO Hive Samurais out there ? Please help with some guidance >> >> sanjay >> >> From: w00t w00t <w00...@yahoo.de> >> Reply-To: "user@hive.apache.org" <user@hive.apache.org>, w00t w00t < >> w00...@yahoo.de> >> Date: Wednesday, August 14, 2013 1:15 AM >> >> To: "user@hive.apache.org" <user@hive.apache.org> >> Subject: Re: Hive and Lzo Compression >> >> >> Thanks for your reply. >> >> The interesting thing I experience is that the SELECT query still works >> - even when I do not specify the STORED AS clause... that puzzles me a bit. >> >> ------------------------------ >> *Von:* Sanjay Subramanian <sanjay.subraman...@wizecommerce.com> >> *An:* "user@hive.apache.org" <user@hive.apache.org>; w00t w00t < >> w00...@yahoo.de> >> *Gesendet:* 3:44 Mittwoch, 14.August 2013 >> *Betreff:* Re: Hive and Lzo Compression >> >> Hi >> >> I think the CREATE TABLE without the STORED AS clause will not give any >> errors while creating the table. >> However when you query that table and since that table contains .lzo >> files , you would get errors. >> With external tables , u r separating the table creation(definition) from >> the data. So only at the time of query of that table, hive might report >> errors. >> >> LZO compression rocks ! I am so glad I used it in our projects here. >> >> Regards >> >> sanjay >> >> From: w00t w00t <w00...@yahoo.de> >> Reply-To: "user@hive.apache.org" <user@hive.apache.org>, w00t w00t < >> w00...@yahoo.de> >> Date: Tuesday, August 13, 2013 12:13 AM >> To: "user@hive.apache.org" <user@hive.apache.org> >> Subject: Re: Hive and Lzo Compression >> >> Thanks for your replies and the link. >> >> I could get it working, but wondered why the CREATE TABLE statement >> worked without the STORED AS Clause as well...that's what puzzles me a >> bit... >> >> But I will use the STORED AS Clause to be on the safe side. >> >> >> ------------------------------ >> *Von:* Lefty Leverenz <leftylever...@gmail.com> >> *An:* user@hive.apache.org >> *CC:* w00t w00t <w00...@yahoo.de> >> *Gesendet:* 19:06 Samstag, 10.August 2013 >> *Betreff:* Re: Hive and Lzo Compression >> >> I'm not seeing any documentation link in Sanjay's message, so here it >> is again (in the Hive wiki's language manual): >> https://cwiki.apache.org/confluence/display/Hive/LanguageManual+LZO. >> >> >> On Thu, Aug 8, 2013 at 3:30 PM, Sanjay Subramanian < >> sanjay.subraman...@wizecommerce.com> wrote: >> >> Please refer this documentation here >> Let me know if u need more clarifications so that we can make this >> document better and complete >> >> Thanks >> >> sanjay >> >> From: w00t w00t <w00...@yahoo.de> >> Reply-To: "user@hive.apache.org" <user@hive.apache.org>, w00t w00t < >> w00...@yahoo.de> >> Date: Thursday, August 8, 2013 2:02 AM >> To: "user@hive.apache.org" <user@hive.apache.org> >> Subject: Hive and Lzo Compression >> >> >> Hello, >> >> I am started to run Hive with Lzo compression on Hortonworks 1.2 >> >> I have managed to install/configure Lzo and hive -e "set >> io.compression.codecs" shows me the Lzo Codecs: >> io.compression.codecs= >> org.apache.hadoop.io.compress.GzipCodec, >> org.apache.hadoop.io.compress.DefaultCodec, >> com.hadoop.compression.lzo.LzoCodec, >> com.hadoop.compression.lzo.LzopCodec, >> org.apache.hadoop.io.compress.BZip2Codec >> >> However, I have some questions where I would be happy if you could help >> me. >> >> (1) CREATE TABLE statement >> >> I read in different postings, that in the CREATE TABLE statement, I have >> to use the following STORAGE clause: >> >> CREATE EXTERNAL TABLE txt_table_lzo ( >> txt_line STRING >> ) >> ROW FORMAT DELIMITED FIELDS TERMINATED BY '||||' >> STORED AS INPUTFORMAT 'com.hadoop.mapred.DeprecatedLzoTextInputFormat' >> OUTPUTFORMAT 'org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat' >> LOCATION '/user/myuser/data/in/lzo_compressed'; >> >> It works withouth any problems now to execute SELECT statements on this >> table with Lzo data. >> >> However I also created a table on the same data without this STORAGE >> clause: >> >> CREATE EXTERNAL TABLE txt_table_lzo_tst ( >> txt_line STRING >> ) >> ROW FORMAT DELIMITED FIELDS TERMINATED BY '||||' >> LOCATION '/user/myuser/data/in/lzo_compressed'; >> >> The interesting thing is, it works as well, when I execute a SELECT >> statement and this table. >> >> Can you help, why the second CREATE TABLE statement works as well? >> What should I use in DDLs? >> Is it best practice to use the STORED AS clause with a >> "deprecatedLzoTextInputFormat"? Or should I remove it? >> >> >> (2) Output and Intermediate Compression Settings >> >> I want to use output compression . >> >> In "Programming Hive" from Capriolo, Wampler, Rutherglen the following >> commands are recommended: >> SET hive.exec.compress.output=true; >> SET mapred.output.compression.codec=com.hadoop.compression.lzo.LzopCodec; >> >> However, in some other places in forums, I found the following >> recommended settings: >> SET hive.exec.compress.output=true >> SET mapreduce.output.fileoutputformat.compress=true >> SET >> mapreduce.output.fileoutputformat.compress.codec=com.hadoop.compression.lzo.LzopCodec >> >> Am I right, that the first settings are for Hadoop versions prior 0.23? >> Or is there any other reason why the settings are different? >> >> I am using Hadoop 1.1.2 with Hive 0.10.0. >> Which settings would you recommend to use? >> >> -------------- >> I also want to compress intermediate results. >> >> Again, in "Programming Hive" the following settings are >> recommended: >> SET hive.exec.compress.intermediate=true; >> SET >> mapred.map.output.compression.codec=com.hadoop.compression.lzo.LzopCodec; >> >> Is this the right setting? >> >> Or should I again use the settings (which look more valid for >> Hadoop 0.23 and greater)?: >> SET hive.exec.compress.intermediate=true; >> SET >> mapreduce.map.output.compression.codec=com.hadoop.compression.lzo.LzopCodec; >> >> Thanks >> >> >> >> >> CONFIDENTIALITY NOTICE >> ====================== >> This email message and any attachments are for the exclusive use of the >> intended recipient(s) and may contain confidential and privileged >> information. Any unauthorized review, use, disclosure or distribution is >> prohibited. If you are not the intended recipient, please contact the >> sender by reply email and destroy all copies of the original message along >> with any attachments, from your computer system. If you are the intended >> recipient, please be advised that the content of this message is subject to >> access, review and disclosure by the sender's Email System Administrator. >> >> >> >> >> -- Lefty >> >> >> >> CONFIDENTIALITY NOTICE >> ====================== >> This email message and any attachments are for the exclusive use of the >> intended recipient(s) and may contain confidential and privileged >> information. Any unauthorized review, use, disclosure or distribution is >> prohibited. If you are not the intended recipient, please contact the >> sender by reply email and destroy all copies of the original message along >> with any attachments, from your computer system. If you are the intended >> recipient, please be advised that the content of this message is subject to >> access, review and disclosure by the sender's Email System Administrator. >> >> >> >> CONFIDENTIALITY NOTICE >> ====================== >> This email message and any attachments are for the exclusive use of the >> intended recipient(s) and may contain confidential and privileged >> information. Any unauthorized review, use, disclosure or distribution is >> prohibited. If you are not the intended recipient, please contact the >> sender by reply email and destroy all copies of the original message along >> with any attachments, from your computer system. If you are the intended >> recipient, please be advised that the content of this message is subject to >> access, review and disclosure by the sender's Email System Administrator. >> > > > > -- > Nitin Pawar > > CONFIDENTIALITY NOTICE > ====================== > This email message and any attachments are for the exclusive use of the > intended recipient(s) and may contain confidential and privileged > information. Any unauthorized review, use, disclosure or distribution is > prohibited. If you are not the intended recipient, please contact the > sender by reply email and destroy all copies of the original message along > with any attachments, from your computer system. If you are the intended > recipient, please be advised that the content of this message is subject to > access, review and disclosure by the sender's Email System Administrator. > -- Nitin Pawar