Thx. So that means hadoop will treat conbinehiveinput as one block if not
set split paramters, is it right?
R
On Wed, Jun 1, 2011 at 6:44 PM, Steven Wong wrote:
> When using CombineHiveInputFormat, parameters such as mapred.max.split.size
> (and others) help determine how the input is split acr
When using CombineHiveInputFormat, parameters such as mapred.max.split.size
(and others) help determine how the input is split across mappers. Other
factors include whether your input files' format is a splittable format or not.
Hope this helps.
From: Junxian Yan [mailto:junxian@gmail.com]
By default, all Hive clients log to the same file called hive.log via DRFA.
What I'm seeing is that many log lines are "lost" after hive.log is rolled over
to hive.log.-MM-DD. Is this an issue with DRFA? What do folks do to avoid
this problem when using concurrent Hive clients?
Thanks.
Stev
On Wed, Jun 1, 2011 at 1:12 PM, Igor Tatarinov wrote:
> Can you pre-aggregate your historical data to reduce the number of files?
>
> We used to partition our data by date but that created too many output
> files so now we partition by month.
>
> I do find it odd that Hive (0.6) can't merge compr
Can you pre-aggregate your historical data to reduce the number of files?
We used to partition our data by date but that created too many output files
so now we partition by month.
I do find it odd that Hive (0.6) can't merge compressed output files. We
could have gotten away with daily partition
I saw this in hadoop wiki:
http://wiki.apache.org/hadoop/HowManyMapsAndReduces
But in my experiment,I see the different result. When I set
the CombineHiveInputFormat in hive and by the doc, the default block should
be 64M, but my input files are more than 64M, hadoop still created one map
task to
Today I tried CombineHiveInputFormat and set the max split size for hadoop
input. Seems I can get the expected map tasks number. But another problem is
the cpu is consumed highly by map tasks. almost 100%.
I just ran a query with simple WHERE condition over testing files,whose
total size is about
As far as I know,
1. The external table does not need to copy data from hdfs to your warehouse
when loading data.
2. "Location" locates the data in hdfs and it links data to the table. And
when you drop table, data is not deleted.
3. The tables' information is stored in your metastore, ie derby, my