Re: Compressed data storage in HDFS - Error

2012-06-08 Thread Sreenath Menon
OK I am getting a little confused now. Consider that I am working on a scenario where there is no limit with memory available. In such scenario, is there any advantage of storing data in HDFS in compressed format. Any advantage, like, if node 1 has data available and it is executing a particular t

Re: Compressed data storage in HDFS - Error

2012-06-08 Thread Denny Lee
; Debarshi Basak > Tata Consultancy Services > Mailto: debarshi.ba...@tcs.com > Website: http://www.tcs.com > > Experience certainty. IT Services > Business Solutions > Outsourcing > ____ > > -----Bejoy Ks wrote: - >

Re: Compressed data storage in HDFS - Error

2012-06-08 Thread Sreenath Menon
Any idea about lzo or bzip2...any of these splittable??

Re: Compressed data storage in HDFS - Error

2012-06-08 Thread Raja Thiruvathuru
___ >>> Experience certainty. IT Services >>> Business Solutions >>> Outsourcing >>> >>> >>> -Bejoy Ks ** wrote: -** >>> >>> To: "user@hive.apache.org"

Re: Compressed data storage in HDFS - Error

2012-06-08 Thread Edward Capriolo
: http://www.tcs.com >>> >>> Experience certainty. IT Services >>> Business Solutions >>> Outsourcing >>> ________________ >>> >>> -Bejoy Ks wrote: - >>> >>> To: "user@hive.apache.org" >&

Re: Compressed data storage in HDFS - Error

2012-06-08 Thread Mark Grover
Services >> Mailto: debarshi.ba...@tcs.com >> Website: http://www.tcs.com >> >> Experience certainty. IT Services >> Business Solutions >> Outsourcing >> >> >> -Bejoy Ks ** wrote: -----** >>

Re: Compressed data storage in HDFS - Error

2012-06-06 Thread Vinod Singh
___ > Experience certainty. IT Services > Business Solutions > Outsourcing > > > -Bejoy Ks ** wrote: -** > > To: "user@hive.apache.org" > From: Bejoy Ks > Date: 06/06/2012 03:37PM > Subject: Re: C

Re: Compressed data storage in HDFS - Error

2012-06-06 Thread Debarshi Basak
ubject: Re: Compressed data storage in HDFS - Error Hi BejoyI would like to make this clear.There is no gain on processing throughput/time on compressing the data stored in HDFS (not talking about intermediate compression)...wright??And do I need to add the lzo libraries in Hadoop_Home/lib/native for all

Re: Compressed data storage in HDFS - Error

2012-06-06 Thread Bejoy Ks
hive client. Regards Bejoy KS From: Sreenath Menon To: user@hive.apache.org; Bejoy Ks Sent: Wednesday, June 6, 2012 3:25 PM Subject: Re: Compressed data storage in HDFS - Error Hi Bejoy I would like to make this clear. There is no gain on processing

Re: Compressed data storage in HDFS - Error

2012-06-06 Thread Sreenath Menon
Hi Bejoy I would like to make this clear. There is no gain on processing throughput/time on compressing the data stored in HDFS (not talking about intermediate compression)...wright?? And do I need to add the lzo libraries in Hadoop_Home/lib/native for all the nodes (including the slave nodes)??

Re: Compressed data storage in HDFS - Error

2012-06-06 Thread Bejoy Ks
: Sreenath Menon To: user@hive.apache.org Sent: Wednesday, June 6, 2012 3:08 PM Subject: Re: Compressed data storage in HDFS - Error Thanks for the response. 1)How do I use the Gz compression and does it come with Hadoop. Or else how do I build a compression method for using in Hive. I would like

Re: Compressed data storage in HDFS - Error

2012-06-06 Thread Bejoy Ks
=true mapred.map.output.compression.codec= hadoop.compression.lzo.LzoCodec Regards Bejoy KS From: Siddharth Tiwari To: "user@hive.apache.org " Sent: Wednesday, June 6, 2012 2:58 PM Subject: RE: Compressed data storage in HDFS - Error There is something you gain and some

Re: Compressed data storage in HDFS - Error

2012-06-06 Thread Sreenath Menon
k...understood...so you load the compressed data into memory (thereby decreasing the size of file needed to be loaded) and then apply decompression algorithm to get the uncompressed data. is this what happens?

Re: Compressed data storage in HDFS - Error

2012-06-06 Thread Sreenath Menon
Thanks for the response. 1)How do I use the Gz compression and does it come with Hadoop. Or else how do I build a compression method for using in Hive. I would like to run evaluation across compression methods. What is the default compression used in Hadoop. 2)Kindly bear with me if this question

RE: Compressed data storage in HDFS - Error

2012-06-06 Thread Siddharth Tiwari
There is something you gain and something you loose. Compression would reduce IO through increased cpu work . Also you would receive different experience for different tasks ie HDFS read , HDFS write , shuffle and sort . So to go for compression or not depends on your usages . Sent from my N8

Re: Compressed data storage in HDFS - Error

2012-06-06 Thread Debarshi Basak
Basically, when your data is compressed you have lesser IO than your uncompressd data. During job execution is doesn't decompress. It would be a relevant question in Hadoop's mailing list than hive.Debarshi BasakTata Consultancy ServicesMailto: debarshi.ba...@tcs.comWebsite: http://www.tcs.com_

Re: Compressed data storage in HDFS - Error

2012-06-06 Thread Debarshi Basak
Yes performance is better because your IO is less when your data is lessDebarshi BasakTata Consultancy ServicesMailto: debarshi.ba...@tcs.comWebsite: http://www.tcs.comExperience certainty. IT ServicesBusiness SolutionsOutsourcing_

Re: Compressed data storage in HDFS - Error

2012-06-06 Thread Debarshi Basak
LZO doesn't ship with apache hadoop you need to build it..try GZDebarshi BasakTata Consultancy ServicesMailto: debarshi.ba...@tcs.comWebsite: http://www.tcs.comExperience certainty. IT ServicesBusiness SolutionsOutsourcing_