Thanks Owen and Hongzhan.
I understand the behavior now.
On Tue, Aug 13, 2013 at 6:28 AM, hongzhan li wrote:
> if you select all the columns ,the orc will not faster than textfile.but
> if you select some column (not all of the colimns),orc will run faster.
> —
>
>
> On Mon, Aug 12, 2013 at 6:
if you select all the columns ,the orc will not faster than textfile.but if you
select some column (not all of the colimns),orc will run faster.
—
On Mon, Aug 12, 2013 at 6:40 PM, pandees waran wrote:
> Hi,
> Currently, we use TEXTFILE format in hive 0.8 ,while creating the
> external tables in
Ok, given the large number of doubles in the schema and bzip2 compression,
I can see why the text would be smaller.
ORC doesn't do compression on floats or doubles, although there is a jira
to do so. (https://issues.apache.org/jira/browse/HIVE-3889)
Bzip is a very aggressive compressor. We should
Hi Owen,
Thanks for your response.
My structure is like:
a)Textfile:
CREATE EXTERNAL TABLE test_textfile (
COL1 BIGINT,
COL2 STRING,
COL3 BIGINT,
COL4 STRING,
COL5 STRING,
COL6 BIGINT,
COL7 BIGINT,
COL8 BIGINT,
COL9 BIGINT,
COl10 BIGINT,
COl11 BIGINT,
Pandees,
I've never seen a table that was larger with ORC than with text. Can you
share your text's file schema with us? Is the table very small? How many
rows and GB are the tables? The overhead for ORC is typically small, but as
Ed says it is possible for rare cases for the overhead to dominate
Thanks Edward. I shall try compression besides orc and let you know. And
also, it looks like the cpu usage is lesser while querying orc rather
than text file.
But the total time taken by the query time is slightly more in orc than
text file. Could you please explain the difference between cumul
Colmnar formats do not always beat row wise storage. Many times gzip plus
block storage will compress something better then columnar storage
especially when you have repeated data in different columns.
Based on what you are saying it could be possible that you missed a setting
and the ocr are not