[ https://issues.apache.org/jira/browse/HADOOP-8126?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Andrew Purtell resolved HADOOP-8126. ------------------------------------ Resolution: Invalid Note to self: Don't try to open jiras on the iPhone. > [Coprocessors] Add hooks for bulk loading actions > ------------------------------------------------- > > Key: HADOOP-8126 > URL: https://issues.apache.org/jira/browse/HADOOP-8126 > Project: Hadoop Common > Issue Type: Improvement > Reporter: Andrew Purtell > > The API gap for bulk HFile loading was discussed on the mailing list but it > didn't make it into a JIRA. It also came up on HBASE-5498. > See http://search-hadoop.com/m/eEUHK1s4fo81/bulk+loading+and+RegionObservers > The salient detail: > {quote} > A simple and straightforward course of action is to give the CP the option of > rewriting the submitted store file(s) before the regionserver attempts to > validate and move them into the store. This is similar to how CPs are hooked > into compaction: CPs hook compaction by allowing one to wrap the scanner that > is iterating over the store files. So the wrapper gets a chance to examine > the KeyValues being processed and also has an opportunity to modify or drop > them. > Similarly for incoming HFiles for bulk load, the CP could be given a scanner > iterating over those files, if you had a RegionObserver installed. You would > be given the option in effect to rewrite the incoming HFiles before they are > handed over to the RegionServer for addition to the region. > {quote} > I think this is a reasonable approach to interface design, because the fact > you are given a scanner highlights the bulk nature of the input. However I > think there should be two hooks here: one that allows for a simple yes/no > answer as to whether the bulk load should proceed; and one that allows for a > more expensive filtering or transformation or whatever via scanner-like > interface. Bulk loads could be potentially very large so requiring a scan > over them always is not a good idea. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira