Re: Running hive on large number of files in S3

2011-10-21 Thread Thulasi Ram Naidu Peddineni
ery=true to > speed up part b. See https://forums.aws.amazon.com/ann.jspa?annID=1105 for > more info. > > > > > > From: Ashutosh Chauhan [mailto:hashut...@apache.org] > Sent: Thursday, October 20, 2011 1:21 PM > To: user@hive.apache.org > Subject: Re: Running

RE: Running hive on large number of files in S3

2011-10-20 Thread Steven Wong
: Running hive on large number of files in S3 Hey Thulasi, There are two factors which may affect job startup time in case of large number of partitions: a) Getting partition info from metastore: Hive stores metadata about each partiton in metastore. Depending on number of partitions, it needs to

Re: Running hive on large number of files in S3

2011-10-20 Thread Ashutosh Chauhan
Hey Thulasi, There are two factors which may affect job startup time in case of large number of partitions: a) Getting partition info from metastore: Hive stores metadata about each partiton in metastore. Depending on number of partitions, it needs to fetch, that can take some time. b) Input spli

Re: Running hive on large number of files in S3

2011-10-20 Thread Jerome Boulon
Hi, I don't think that your job is actually prefetching the data while you're waiting. If you have a large number of partitions then getting the list of files to compute the split (aka prefetching the filenames from S3) is what is taking for ever. If you have a premium support from amazon you may w