date:20110601

Re: Dose block size determine the number of map task

2011-06-01 Thread Junxian Yan

Thx. So that means hadoop will treat conbinehiveinput as one block if not set split paramters, is it right? R On Wed, Jun 1, 2011 at 6:44 PM, Steven Wong wrote: > When using CombineHiveInputFormat, parameters such as mapred.max.split.size > (and others) help determine how the input is split acr

RE: Dose block size determine the number of map task

2011-06-01 Thread Steven Wong

When using CombineHiveInputFormat, parameters such as mapred.max.split.size (and others) help determine how the input is split across mappers. Other factors include whether your input files' format is a splittable format or not. Hope this helps. From: Junxian Yan [mailto:junxian@gmail.com]

Hive logging concurrency

2011-06-01 Thread Steven Wong

By default, all Hive clients log to the same file called hive.log via DRFA. What I'm seeing is that many log lines are "lost" after hive.log is rolled over to hive.log.-MM-DD. Is this an issue with DRFA? What do folks do to avoid this problem when using concurrent Hive clients? Thanks. Stev

Re: question about number of map tasks for small file

2011-06-01 Thread Edward Capriolo

On Wed, Jun 1, 2011 at 1:12 PM, Igor Tatarinov wrote: > Can you pre-aggregate your historical data to reduce the number of files? > > We used to partition our data by date but that created too many output > files so now we partition by month. > > I do find it odd that Hive (0.6) can't merge compr

Re: question about number of map tasks for small file

2011-06-01 Thread Igor Tatarinov

Can you pre-aggregate your historical data to reduce the number of files? We used to partition our data by date but that created too many output files so now we partition by month. I do find it odd that Hive (0.6) can't merge compressed output files. We could have gotten away with daily partition

Dose block size determine the number of map task

2011-06-01 Thread Junxian Yan

I saw this in hadoop wiki: http://wiki.apache.org/hadoop/HowManyMapsAndReduces But in my experiment,I see the different result. When I set the CombineHiveInputFormat in hive and by the doc, the default block should be 64M, but my input files are more than 64M, hadoop still created one map task to

Re: question about number of map tasks for small file

2011-06-01 Thread Junxian Yan

Today I tried CombineHiveInputFormat and set the max split size for hadoop input. Seems I can get the expected map tasks number. But another problem is the cpu is consumed highly by map tasks. almost 100%. I just ran a query with simple WHERE condition over testing files,whose total size is about

Re: Hive basic questions

2011-06-01 Thread jinhang du

As far as I know， 1. The external table does not need to copy data from hdfs to your warehouse when loading data. 2. "Location" locates the data in hdfs and it links data to the table. And when you drop table, data is not deleted. 3. The tables' information is stored in your metastore, ie derby, my

Re: Dose block size determine the number of map task

RE: Dose block size determine the number of map task

Hive logging concurrency

Re: question about number of map tasks for small file

Re: question about number of map tasks for small file

Dose block size determine the number of map task

Re: question about number of map tasks for small file

Re: Hive basic questions

8 matches

Site Navigation

Mail list logo

Footer information