Re: Adding update/delete to the hive-hcatalog-streaming API

2015-04-01 Thread Elliot West
Hi Alan, Regarding the streaming changes, I've raised an issue and submitted patches here: https://issues.apache.org/jira/browse/HIVE-10165 Thanks - Elliot. On 26 March 2015 at 23:20, Alan Gates wrote: > > > Elliot West > March 26, 2015 at 15:58 > Hi Alan, > > Yes, this is precisely our si

RE: Adding update/delete to the hive-hcatalog-streaming API

2015-03-26 Thread Mich Talebzadeh
. From: Elliot West [mailto:tea...@gmail.com] Sent: 26 March 2015 23:04 To: user@hive.apache.org Subject: Re: Adding update/delete to the hive-hcatalog-streaming API Hi Mich, Yes, we have a timestamp on each record. Our processes effectively group by a key and order by time stamp

Re: Adding update/delete to the hive-hcatalog-streaming API

2015-03-26 Thread Alan Gates
Elliot West March 26, 2015 at 15:58 Hi Alan, Yes, this is precisely our situation. The issues I'm having with the current API are that I cannot intercept the creation of the OrcRecordUpdater to set the recordIdColumn in the AcidOutputFormat.Options instance. Additi

Re: Adding update/delete to the hive-hcatalog-streaming API

2015-03-26 Thread Elliot West
Hi Mich, Yes, we have a timestamp on each record. Our processes effectively group by a key and order by time stamp. Cheers - Elliot.

Re: Adding update/delete to the hive-hcatalog-streaming API

2015-03-26 Thread Elliot West
Hi Alan, Yes, this is precisely our situation. The issues I'm having with the current API are that I cannot intercept the creation of the OrcRecordUpdater to set the recordIdColumn in the AcidOutputFormat.Options instance. Additionally, I cannot extend the TransactionBatch interface to expose furt

RE: Adding update/delete to the hive-hcatalog-streaming API

2015-03-26 Thread Mich Talebzadeh
Elliot West [mailto:tea...@gmail.com] Sent: 26 March 2015 22:10 To: user@hive.apache.org Subject: Re: Adding update/delete to the hive-hcatalog-streaming API Hi, thanks for your quick reply. I see your point, but in my case would I not have the required RecordIdentifiers available as I

Re: Adding update/delete to the hive-hcatalog-streaming API

2015-03-26 Thread Alan Gates
Are you saying that when the records arrive you don't know updates from inserts and you're already doing processing to determine that? If so, this is exactly the case we'd like to hit with the merge functionality. If you're already scanning the existing ORC file and obtaining the unique ident

RE: Adding update/delete to the hive-hcatalog-streaming API

2015-03-26 Thread Mich Talebzadeh
...@gmail.com] Sent: 26 March 2015 22:10 To: user@hive.apache.org Subject: Re: Adding update/delete to the hive-hcatalog-streaming API Hi, thanks for your quick reply. I see your point, but in my case would I not have the required RecordIdentifiers available as I'm already reading the e

Re: Adding update/delete to the hive-hcatalog-streaming API

2015-03-26 Thread Elliot West
Hi, thanks for your quick reply. I see your point, but in my case would I not have the required RecordIdentifiers available as I'm already reading the entire partition to determine which records have changed? Admittedly Hive will not reveal the ROW__IDs to me but I assume (incorrectly perhaps) th

RE: Adding update/delete to the hive-hcatalog-streaming API

2015-03-26 Thread Mich Talebzadeh
Peridale Ltd, its subsidiaries nor their employees accept any responsibility. From: Alan Gates [mailto:alanfga...@gmail.com] Sent: 26 March 2015 21:48 To: user@hive.apache.org Subject: Re: Adding update/delete to the hive-hcatalog-streaming API The missing piece for adding update and delete

Re: Adding update/delete to the hive-hcatalog-streaming API

2015-03-26 Thread Alan Gates
The missing piece for adding update and delete to the streaming API is a primary key. Updates and deletes in SQL work by scanning the table or partition where the record resides. This is assumed to be ok since we are not supporting transactional workloads and thus update/deletes are assumed t

Adding update/delete to the hive-hcatalog-streaming API

2015-03-26 Thread Elliot West
Hi, I'd like to ascertain if it might be possible to add 'update' and 'delete' operations to the hive-hcatalog-streaming API. I've been looking at the API with interest for the last week as it appears to have the potential to help with some general data processing patterns that are prevalent where