Re: load data unit of work

2011-06-15 Thread W S Chung
If that is the case, I'll just need to cleanup the partially loaded hdfs file in a background job. That should do. On Wed, Jun 15, 2011 at 3:28 PM, Guy Bayes wrote: > I think if you load a file, validate it, and then* alter table add > partition *to the final table at the end, in the event of cr

Re: load data unit of work

2011-06-15 Thread Guy Bayes
I think if you load a file, validate it, and then* alter table add partition *to the final table at the end, in the event of crash you only have a partially loaded etl file that no one will be querying anyway. That should work, though I am not speaking from personal experience, at least not with H

Re: load data unit of work

2011-06-15 Thread W S Chung
If the failure of the loading is severe enough, like the whole machine crashes, that there might not be an opportunity to catch the exception and cleanup the partition right away. The best I can think of is to cleanup the partition in a background job reasonably regularly. In that case, before the

Re: load data unit of work

2011-06-14 Thread Guy Bayes
easiest way to achieve a level of robustness is probably to load into a partition and then truncate the partition on the event of failure Cleaning up after an incomplete load is a problem in many traditional rdbm's, you can not always rely on rollback functionality No explicit delete's in HIVE t

Re: load data unit of work

2011-06-14 Thread W S Chung
My question is a "what if" question, not a production issue. It seems natural, when replacing traditional database with hive, to ask how much robustness is sacrificed for scalability. My concern is that if a file is partially loaded, there might not be an easy way to clean up the already loaded dat

Re: load data unit of work

2011-06-13 Thread Martin Konicek
Hi, I think this is a problem with open source in general and sometimes it can be very frustrating. However, your question is more of a "what if" question - you're not in the trouble of finding a horrible bug after you deployed to production, am I right? Regarding your question, I would gues

load data unit of work

2011-06-13 Thread W S Chung
I submit a question like this before, but somehow that question is never delivered. I can even find my question in google. Since I cannot find any admin e-mail/feedback form on the hive website that I can ask why the last question is not delivered. There is not much option other than to post the qu