Re: ORC vs TEXT file

2013-08-12 Thread pandees waran
Thanks Owen and Hongzhan. I understand the behavior now. On Tue, Aug 13, 2013 at 6:28 AM, hongzhan li wrote: > if you select all the columns ,the orc will not faster than textfile.but > if you select some column (not all of the colimns),orc will run faster. > — > > > On Mon, Aug 12, 2013 at 6:

Re: ORC vs TEXT file

2013-08-12 Thread hongzhan li
if you select all the columns ,the orc will not faster than textfile.but if you select some column (not all of the colimns),orc will run faster. — On Mon, Aug 12, 2013 at 6:40 PM, pandees waran wrote: > Hi, > Currently, we use TEXTFILE format in hive 0.8 ,while creating the > external tables in

Re: ORC vs TEXT file

2013-08-12 Thread Owen O'Malley
Ok, given the large number of doubles in the schema and bzip2 compression, I can see why the text would be smaller. ORC doesn't do compression on floats or doubles, although there is a jira to do so. (https://issues.apache.org/jira/browse/HIVE-3889) Bzip is a very aggressive compressor. We should

Re: ORC vs TEXT file

2013-08-12 Thread pandees waran
Hi Owen, Thanks for your response. My structure is like: a)Textfile: CREATE EXTERNAL TABLE test_textfile ( COL1 BIGINT, COL2 STRING, COL3 BIGINT, COL4 STRING, COL5 STRING, COL6 BIGINT, COL7 BIGINT, COL8 BIGINT, COL9 BIGINT, COl10 BIGINT, COl11 BIGINT,

Re: ORC vs TEXT file

2013-08-12 Thread Owen O'Malley
Pandees, I've never seen a table that was larger with ORC than with text. Can you share your text's file schema with us? Is the table very small? How many rows and GB are the tables? The overhead for ORC is typically small, but as Ed says it is possible for rare cases for the overhead to dominate

Re: ORC vs TEXT file

2013-08-12 Thread pandees waran
Thanks Edward. I shall try compression besides orc and let you know. And also, it looks like the cpu usage is lesser while querying orc rather than text file. But the total time taken by the query time is slightly more in orc than text file. Could you please explain the difference between cumul

Re: ORC vs TEXT file

2013-08-12 Thread Edward Capriolo
Colmnar formats do not always beat row wise storage. Many times gzip plus block storage will compress something better then columnar storage especially when you have repeated data in different columns. Based on what you are saying it could be possible that you missed a setting and the ocr are not