Hive tables can sit on top of S3 storage so you dont really need a separate export process
thanks, Shrikanth On May 15, 2012, at 11:35 AM, Jon Palmer wrote: > That seems like a very reasonable approach. However, if we use a technology > like Amazon Elastic Map Reduce my Hive cluster is (potentially) going to be > destroyed and recreated. As a result I'd really need to export the update > history Hive table to some other store (like S3) so that it can be > re-imported on the next spin up of the Hive cluster. Do I have that right? > > Jon > > -----Original Message----- > From: shrikanth shankar [mailto:sshan...@qubole.com] > Sent: Tuesday, May 15, 2012 1:14 PM > To: user@hive.apache.org > Subject: Re: What's the right data storage/representation? > > I would agree on keeping track of the history of updates in a separate table > in Hive (you may not need to maintain it in the application tier). This > pattern seems to be the "Slowly Changing Dimension" pattern used in other > (more traditional) Data Warehouses... I suspect the challenge here would be > writing a ETL process to maintain the Hive table based on the current status > of the application db table .. > > Shrikanth > On May 15, 2012, at 9:41 AM, Owen O'Malley wrote: > >> On Tue, May 15, 2012 at 5:11 AM, Jon Palmer <jpal...@care.com> wrote: >>> I can see a few potential solutions: >>> >>> 1. Don't solve it. Accept that you have some artifacts in your >>> reporting data that cannot be recovered from the source data. >>> >>> 2. Create status and location history tables in the application db and >>> use that during the analytics process. >>> >>> 3. Log the status and location change 'events' to some other log file >>> and use those logs in the Hive analysis. >> >> I would probably create a Hive table that includes the status and >> location updates. One of the advantages of Hive & Hadoop is that it is >> easy to store the raw information in bulk and continue to process it. >> Once you have the information, you will likely find new uses for it. >> >> -- Owen > > > > This email is intended for the person(s) to whom it is addressed and may > contain information that is PRIVILEGED or CONFIDENTIAL. Any unauthorized use, > distribution, copying, or disclosure by any person other than the > addressee(s) is strictly prohibited. If you have received this email in > error, please notify the sender immediately by return email and delete the > message and any attachments from your system.