Re: Hive on EMR on S3 : Beginner

Pedro Figueiredo Sat, 25 Aug 2012 01:31:19 -0700

Hi,

On 25 Aug 2012, at 05:58, Ravi Shetye <ravi.she...@vizury.com> wrote:


> Thanks Richin and Pedro,
> So a final clarification
>     Another way of doing apart from dynamic partition is if you can create 
> your directories like below either manually or the ETL process you might be 
> doing to get the table data it     is pretty easy.
> 
>       s3://ravi/logs/adv_id=123/date=2012-01-01/log.gz
>       s3://ravi/logs/adv_id=456/date=2012-01-02/log.gz
>       s3://ravi/logs/adv_id=123/date=2012-01-03/log.gz
> 
> 1)Since I have used PARTITIONED BY (adv_id STRING,date STRING) Hive system 
> will read the bucket name adv_id=123 and understand that the data within this 
> bucket can be accessed by a pseudo column adv_id?

Yes.

> 2) It would be wrong if I use PARTITIONED BY (date STRING,adv_id STRING) and 
> keep the same bucket structure?

Yes, the order of the fields in PARTITIONED BY must match the structure.

> 3)Also it wont work if I store data in s3://ravi/logs/123/2012-01-01/log.gz ?

No, you need xxx=.

Cheers,

Pedro

Re: Hive on EMR on S3 : Beginner

Reply via email to