subject:"Re\: ORC vs TEXT file"

Re: ORC vs TEXT file

2013-08-12 Thread pandees waran

Thanks Owen and Hongzhan. I understand the behavior now. On Tue, Aug 13, 2013 at 6:28 AM, hongzhan li wrote: > if you select all the columns ,the orc will not faster than textfile.but > if you select some column （not all of the colimns）,orc will run faster. > — > > > On Mon, Aug 12, 2013 at 6:

Re: ORC vs TEXT file

2013-08-12 Thread hongzhan li

if you select all the columns ,the orc will not faster than textfile.but if you select some column （not all of the colimns）,orc will run faster. — On Mon, Aug 12, 2013 at 6:40 PM, pandees waran wrote: > Hi, > Currently, we use TEXTFILE format in hive 0.8 ,while creating the > external tables in

Re: ORC vs TEXT file

2013-08-12 Thread Owen O'Malley

Ok, given the large number of doubles in the schema and bzip2 compression, I can see why the text would be smaller. ORC doesn't do compression on floats or doubles, although there is a jira to do so. (https://issues.apache.org/jira/browse/HIVE-3889) Bzip is a very aggressive compressor. We should

Re: ORC vs TEXT file

2013-08-12 Thread pandees waran

Hi Owen, Thanks for your response. My structure is like: a)Textfile: CREATE EXTERNAL TABLE test_textfile ( COL1 BIGINT, COL2 STRING, COL3 BIGINT, COL4 STRING, COL5 STRING, COL6 BIGINT, COL7 BIGINT, COL8 BIGINT, COL9 BIGINT, COl10 BIGINT, COl11 BIGINT,

Re: ORC vs TEXT file

2013-08-12 Thread Owen O'Malley

Pandees, I've never seen a table that was larger with ORC than with text. Can you share your text's file schema with us? Is the table very small? How many rows and GB are the tables? The overhead for ORC is typically small, but as Ed says it is possible for rare cases for the overhead to dominate

Re: ORC vs TEXT file

2013-08-12 Thread pandees waran

Thanks Edward. I shall try compression besides orc and let you know. And also, it looks like the cpu usage is lesser while querying orc rather than text file. But the total time taken by the query time is slightly more in orc than text file. Could you please explain the difference between cumul

Re: ORC vs TEXT file

2013-08-12 Thread Edward Capriolo

Colmnar formats do not always beat row wise storage. Many times gzip plus block storage will compress something better then columnar storage especially when you have repeated data in different columns. Based on what you are saying it could be possible that you missed a setting and the ocr are not

Re: ORC vs TEXT file

Re: ORC vs TEXT file

Re: ORC vs TEXT file

Re: ORC vs TEXT file

Re: ORC vs TEXT file

Re: ORC vs TEXT file

Re: ORC vs TEXT file

7 matches

Site Navigation

Mail list logo

Footer information