Thanks for support Wei-Chiu and Anu. Thinking more on it, we should just open a new JIRA. HDFS-9924 is an old branch with commits we don't need full of commentary that is, ahem, a mite off-topic. Duo can attach his design to the new issue. We can cite HDFS-9924 as provenance and aggregate the discussion as launching pad for the new effort in new issue.
Hopefully this is agreeable, Thanks, S On Thu, May 3, 2018 at 1:54 PM, Anu Engineer <aengin...@hortonworks.com> wrote: > Hi St.ack/Wei-Chiu, > > It is very kind of St.Ack to bring this question to HDFS Dev. I think this > is a good feature to have. As for the branch question, > HDFS-9924 branch is already open, we could just use that and I am +1 on > adding Duo as a branch committer. > > I am not familiar with HBase code base, I am presuming that there will be > some deviation from the current design > doc posted in HDFS-9924. Would it be make sense to post a new design > proposal on HDFS-9924? > > --Anu > > > > On 5/3/18, 9:29 AM, "Wei-Chiu Chuang" <weic...@apache.org> wrote: > > Given that HBase 2 uses async output by default, the way that code is > maintained today in HBase is not sustainable. That piece of code > should be > maintained in HDFS. I am +1 as a participant in both communities. > > On Thu, May 3, 2018 at 9:14 AM, Stack <st...@duboce.net> wrote: > > > Ok with you lot if a few of us open a branch to work on a > non-blocking HDFS > > client? > > > > Intent is to finish up the old issue "HDFS-9924 [umbrella] > Nonblocking HDFS > > Access". On the foot of this umbrella JIRA is a proposal by the > > heavy-lifter, Duo Zhang. Over in HBase, we have a limited async DFS > client > > (written by Duo) that we use making Write-Ahead Logs. We call it > > AsyncFSWAL. It was shipped as the default WAL writer in hbase-2.0.0. > > > > Let me quote Duo from his proposal at the base of HDFS-9924: > > > > ....We use lots of internal APIs of HDFS to implement the > AsyncFSWAL, so it > > is expected that things like HBASE-20244 > > <https://issues.apache.org/jira/browse/HBASE-20244> > > ["NoSuchMethodException > > when retrieving private method decryptEncryptedDataEncryptionKey > from > > DFSClient"] will happen again and again. > > > > To make life easier, we need to move the async output related code > into > > HDFS. The POC [attached as patch on HDFS-9924] shows that option 3 > [1] can > > work, so I would like to create a feature branch to implement the > async dfs > > client. In general I think there are 4 steps: > > > > 1. Implement an async rpc client with option 3 [1] described above. > > 2. Implement the filesystem APIs which only need to connect to NN, > such as > > 'mkdirs'. > > 3. Implement async file read. The problem is the API. For pread I > think a > > CompletableFuture is enough, the problem is for the streaming read. > Need to > > discuss later. > > 4. Implement async file write. The API will also be a problem, but a > more > > important problem is that, if we want to support fan-out, the > current logic > > at DN side will make the semantic broken as we can read uncommitted > data > > very easily. In HBase it is solved by HBASE-14004 > > <https://issues.apache.org/jira/browse/HBASE-14004> but I do not > think we > > should keep the broken behavior in HDFS. We need to find a way to > deal with > > it. > > > > Comments welcome. > > > > Intent is to make a branch named HDFS-9924 (or should we just do a > new > > JIRA?) and to add Duo as a feature branch committer. If all goes > well, > > we'll call for a merge VOTE. > > > > Thanks, > > St.Ack > > > > 1.Option 3: "Use the old protobuf rpc interface and implement a new > rpc > > framework. The benefit is that we also do not need port unification > service > > at server side and do not need to maintain two implementations at > server > > side. And one more thing is that we do not need to upgrade protobuf > to > > 3.x." > > > > > > -- > A very happy Hadoop contributor > > >