Re: Hive ExIm from on-premise HDP to Amazon EMR

2016-01-25 Thread Elliot West
Yes, we do use Falcon. But only a small fraction of our the datasets we wish to replicate are defined in this way. Could I perhaps just declare the feeds in falcon and not the processes that create them? Also, doesn't falcon use Hive ExIm/Replication to achieve this internally and therefore might I

Re: Hive ExIm from on-premise HDP to Amazon EMR

2016-01-24 Thread Artem Ervits
Have you looked at Apache Falcon? On Jan 8, 2016 2:41 AM, "Elliot West" wrote: > Further investigation appears to show this going wrong in a copy phase of > the plan. The correctly functioning HDFS → HDFS import copy stage looks > like this: > > STAGE PLANS: > Stage: Stage-1 > Copy >

Re: Hive ExIm from on-premise HDP to Amazon EMR

2016-01-07 Thread Elliot West
Further investigation appears to show this going wrong in a copy phase of the plan. The correctly functioning HDFS → HDFS import copy stage looks like this: STAGE PLANS: Stage: Stage-1 Copy source: hdfs://host:8020/staging/my_table/year_month=2015-12 destination: hdfs://host:8020

Hive ExIm from on-premise HDP to Amazon EMR

2016-01-07 Thread Elliot West
Hello, Following on from my earlier post concerning syncing Hive data from an on premise cluster to the cloud, I've been experimenting with the IMPORT/EXPORT functionality to move data from an on-premise HDP cluster to Amazon EMR. I started out with some simple Exports/Imports as these can be the