Hi St.ack/Wei-Chiu, It is very kind of St.Ack to bring this question to HDFS Dev. I think this is a good feature to have. As for the branch question, HDFS-9924 branch is already open, we could just use that and I am +1 on adding Duo as a branch committer.
I am not familiar with HBase code base, I am presuming that there will be some deviation from the current design doc posted in HDFS-9924. Would it be make sense to post a new design proposal on HDFS-9924? --Anu On 5/3/18, 9:29 AM, "Wei-Chiu Chuang" <weic...@apache.org> wrote: Given that HBase 2 uses async output by default, the way that code is maintained today in HBase is not sustainable. That piece of code should be maintained in HDFS. I am +1 as a participant in both communities. On Thu, May 3, 2018 at 9:14 AM, Stack <st...@duboce.net> wrote: > Ok with you lot if a few of us open a branch to work on a non-blocking HDFS > client? > > Intent is to finish up the old issue "HDFS-9924 [umbrella] Nonblocking HDFS > Access". On the foot of this umbrella JIRA is a proposal by the > heavy-lifter, Duo Zhang. Over in HBase, we have a limited async DFS client > (written by Duo) that we use making Write-Ahead Logs. We call it > AsyncFSWAL. It was shipped as the default WAL writer in hbase-2.0.0. > > Let me quote Duo from his proposal at the base of HDFS-9924: > > ....We use lots of internal APIs of HDFS to implement the AsyncFSWAL, so it > is expected that things like HBASE-20244 > <https://issues.apache.org/jira/browse/HBASE-20244> > ["NoSuchMethodException > when retrieving private method decryptEncryptedDataEncryptionKey from > DFSClient"] will happen again and again. > > To make life easier, we need to move the async output related code into > HDFS. The POC [attached as patch on HDFS-9924] shows that option 3 [1] can > work, so I would like to create a feature branch to implement the async dfs > client. In general I think there are 4 steps: > > 1. Implement an async rpc client with option 3 [1] described above. > 2. Implement the filesystem APIs which only need to connect to NN, such as > 'mkdirs'. > 3. Implement async file read. The problem is the API. For pread I think a > CompletableFuture is enough, the problem is for the streaming read. Need to > discuss later. > 4. Implement async file write. The API will also be a problem, but a more > important problem is that, if we want to support fan-out, the current logic > at DN side will make the semantic broken as we can read uncommitted data > very easily. In HBase it is solved by HBASE-14004 > <https://issues.apache.org/jira/browse/HBASE-14004> but I do not think we > should keep the broken behavior in HDFS. We need to find a way to deal with > it. > > Comments welcome. > > Intent is to make a branch named HDFS-9924 (or should we just do a new > JIRA?) and to add Duo as a feature branch committer. If all goes well, > we'll call for a merge VOTE. > > Thanks, > St.Ack > > 1.Option 3: "Use the old protobuf rpc interface and implement a new rpc > framework. The benefit is that we also do not need port unification service > at server side and do not need to maintain two implementations at server > side. And one more thing is that we do not need to upgrade protobuf to > 3.x." > -- A very happy Hadoop contributor --------------------------------------------------------------------- To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org