ery=true to
> speed up part b. See https://forums.aws.amazon.com/ann.jspa?annID=1105 for
> more info.
>
>
>
>
>
> From: Ashutosh Chauhan [mailto:hashut...@apache.org]
> Sent: Thursday, October 20, 2011 1:21 PM
> To: user@hive.apache.org
> Subject: Re: Running
: Running hive on large number of files in S3
Hey Thulasi,
There are two factors which may affect job startup time in case of large number
of partitions:
a) Getting partition info from metastore: Hive stores metadata about each
partiton in metastore. Depending on number of partitions, it needs to
Hey Thulasi,
There are two factors which may affect job startup time in case of large
number of partitions:
a) Getting partition info from metastore: Hive stores metadata about each
partiton in metastore. Depending on number of partitions, it needs to fetch,
that can take some time.
b) Input spli
Hi,
I don't think that your job is actually prefetching the data while you're
waiting.
If you have a large number of partitions then getting the list of files to
compute the split
(aka prefetching the filenames from S3) is what is taking for ever.
If you have a premium support from amazon you may w