[ 
https://issues.apache.org/jira/browse/HADOOP-8126?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andrew Purtell resolved HADOOP-8126.
------------------------------------

    Resolution: Invalid

Note to self: Don't try to open jiras on the iPhone.
                
> [Coprocessors] Add hooks for bulk loading actions
> -------------------------------------------------
>
>                 Key: HADOOP-8126
>                 URL: https://issues.apache.org/jira/browse/HADOOP-8126
>             Project: Hadoop Common
>          Issue Type: Improvement
>            Reporter: Andrew Purtell
>
> The API gap for bulk HFile loading was discussed on the mailing list but it 
> didn't make it into a JIRA. It also came up on HBASE-5498. 
> See http://search-hadoop.com/m/eEUHK1s4fo81/bulk+loading+and+RegionObservers
> The salient detail:
> {quote}
> A simple and straightforward course of action is to give the CP the option of 
> rewriting the submitted store file(s) before the regionserver attempts to 
> validate and move them into the store. This is similar to how CPs are hooked 
> into compaction: CPs hook compaction by allowing one to wrap the scanner that 
> is iterating over the store files. So the wrapper gets a chance to examine 
> the KeyValues being processed and also has an opportunity to modify or drop 
> them.
> Similarly for incoming HFiles for bulk load, the CP could be given a scanner 
> iterating over those files, if you had a RegionObserver installed. You would 
> be given the option in effect to rewrite the incoming HFiles before they are 
> handed over to the RegionServer for addition to the region.
> {quote}
> I think this is a reasonable approach to interface design, because the fact 
> you are given a scanner highlights the bulk nature of the input. However I 
> think there should be two hooks here: one that allows for a simple yes/no 
> answer as to whether the bulk load should proceed; and one that allows for a 
> more expensive filtering or transformation or whatever via scanner-like 
> interface. Bulk loads could be potentially very large so requiring a scan 
> over them always is not a good idea.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

Reply via email to