Re: s3a and hive

Elliot West Tue, 15 Nov 2016 07:31:05 -0800

My gut feeling is that this is not something you should do (except for
fun!) I'm fairly confident that somewhere in Hive, MR, or Tez, you'll hit
some code that requires consistent, atomic move/copy/list/overwrite
semantics from the warehouse filesystem. This is not something that the
vanilla S3AFileSystem can provide. Even if you get to the point where
everything appears functionally sound, I expect you'll encounter unusual
and inconsistent behavior if you use this in the long term.


Solutions to Hive on S3 include:

   - Use S3Guard (not yet available):
   https://issues.apache.org/jira/browse/HADOOP-13345
   - Use Hive on EMR with Amazon's S3 filesystem implementation and EMRFS.
   Note that this confusingly requires and overloads the 's3://' scheme.

Hope this helps, and please report back with any findings as we are doing
quite a bit of Hive in AWS too.

Cheers - Elliot.

On 15 November 2016 at 15:19, Stephen Sprague <sprag...@gmail.com> wrote:

> no. permissions are good.  i believe the case to be that s3a does not have
> a "move" and/or "rename" semantic but i can't be the first one to encounter
> this. somebody out there has to have gone done this path way before me
> surely.
>
> searching the cyber i find this:
>
>    https://issues.apache.org/jira/browse/HIVE-14270
>
> which is part of a even more work with s3 (see the related jira's that
> that jira comes under) especially the Hadoop Uber-Jira.
>
>
> so after digging though those jira's lemme ask:
>
> has anyone set hive.metastore.warehouse.dir to a s3a location with
> success?
>
> seems to me hive 2.2.0 and perhaps hadoop 2.7 or 2.8 are the only chances
> of success but i'm happy to be told i'm wrong.
>
> thanks,
> Stephen.
>
>
>
> On Mon, Nov 14, 2016 at 10:25 PM, Jörn Franke <jornfra...@gmail.com>
> wrote:
>
>> Is it a permission issue on the folder?
>>
>> On 15 Nov 2016, at 06:28, Stephen Sprague <sprag...@gmail.com> wrote:
>>
>> so i figured i try and set hive.metastore.warehouse.dir=s3a://bucket/hive
>> and see what would happen.
>>
>> running this query:
>>
>>     insert overwrite table omniture.hit_data_aws partition
>> (date_key=20161113) select * from staging.hit_data_aws_ext_20161113
>> limit 1;
>>
>> yields this error:
>>
>>    Failed with exception java.io.IOException: rename for src path:
>> s3a://trulia-dwr-cluster-dev/hive/omniture.db/hit_data_aws/date_key=20161113/.hive-
>> staging_hive_2016-11-15_04-57-52_085_7825126612479617470-1/-ext-10000/000000_0
>> to dest 
>> path:s3a://trulia-dwr-cluster-dev/hive/omniture.db/hit_data_aws/date_key=20161113/000000_0
>> returned false
>> FAILED: Execution Error, return code 1 from 
>> org.apache.hadoop.hive.ql.exec.MoveTask.
>> java.io.IOException: rename for src path: s3a://trulia-dwr-cluster-dev/h
>> ive/omniture.db/hit_data_aws/date_key=20161113/.hive-staging
>> _hive_2016-11-15_04-57-52_085_7825126612479617470-1/-ext-10000/000000_0
>> to dest 
>> path:s3a://trulia-dwr-cluster-dev/hive/omniture.db/hit_data_aws/date_key=20161113/000000_0
>> returned false
>>
>>
>> is there any workaround?   i'm running hive 2.1.0 and hadoop version
>> 2.6.0-cdh5.7.1  .
>>
>>
>> thanks,
>> Stephen.
>>
>>
>

Re: s3a and hive

Reply via email to