Re: [DISCUSSION] Create a branch to work on non-blocking access to HDFS

Anu Engineer Thu, 03 May 2018 13:55:12 -0700

Hi St.ack/Wei-Chiu,

It is very kind of St.Ack to bring this question to HDFS Dev. I think this is a 
good feature to have. As for the branch question, 
HDFS-9924 branch is already open, we could just use that and I am +1 on adding 
Duo as a branch committer.


I am not familiar with HBase code base, I am presuming that there will be some 
deviation from the current design 
doc posted in HDFS-9924. Would it be make sense to post a new design proposal 
on HDFS-9924? 

--Anu



On 5/3/18, 9:29 AM, "Wei-Chiu Chuang" <weic...@apache.org> wrote:

    Given that HBase 2 uses async output by default, the way that code is
    maintained today in HBase is not sustainable. That piece of code should be
    maintained in HDFS. I am +1 as a participant in both communities.
    
    On Thu, May 3, 2018 at 9:14 AM, Stack <st...@duboce.net> wrote:
    
    > Ok with you lot if a few of us open a branch to work on a non-blocking 
HDFS
    > client?
    >
    > Intent is to finish up the old issue "HDFS-9924 [umbrella] Nonblocking 
HDFS
    > Access". On the foot of this umbrella JIRA is a proposal by the
    > heavy-lifter, Duo Zhang. Over in HBase, we have a limited async DFS client
    > (written by Duo) that we use making Write-Ahead Logs. We call it
    > AsyncFSWAL. It was shipped as the default WAL writer in hbase-2.0.0.
    >
    > Let me quote Duo from his proposal at the base of HDFS-9924:
    >
    > ....We use lots of internal APIs of HDFS to implement the AsyncFSWAL, so 
it
    > is expected that things like HBASE-20244
    > <https://issues.apache.org/jira/browse/HBASE-20244>
    > ["NoSuchMethodException
    > when retrieving private method decryptEncryptedDataEncryptionKey from
    > DFSClient"] will happen again and again.
    >
    > To make life easier, we need to move the async output related code into
    > HDFS. The POC [attached as patch on HDFS-9924] shows that option 3 [1] can
    > work, so I would like to create a feature branch to implement the async 
dfs
    > client. In general I think there are 4 steps:
    >
    > 1. Implement an async rpc client with option 3 [1] described above.
    > 2. Implement the filesystem APIs which only need to connect to NN, such as
    > 'mkdirs'.
    > 3. Implement async file read. The problem is the API. For pread I think a
    > CompletableFuture is enough, the problem is for the streaming read. Need 
to
    > discuss later.
    > 4. Implement async file write. The API will also be a problem, but a more
    > important problem is that, if we want to support fan-out, the current 
logic
    > at DN side will make the semantic broken as we can read uncommitted data
    > very easily. In HBase it is solved by HBASE-14004
    > <https://issues.apache.org/jira/browse/HBASE-14004> but I do not think we
    > should keep the broken behavior in HDFS. We need to find a way to deal 
with
    > it.
    >
    > Comments welcome.
    >
    > Intent is to make a branch named HDFS-9924 (or should we just do a new
    > JIRA?) and to add Duo as a feature branch committer. If all goes well,
    > we'll call for a merge VOTE.
    >
    > Thanks,
    > St.Ack
    >
    > 1.Option 3:  "Use the old protobuf rpc interface and implement a new rpc
    > framework. The benefit is that we also do not need port unification 
service
    > at server side and do not need to maintain two implementations at server
    > side. And one more thing is that we do not need to upgrade protobuf to
    > 3.x."
    >
    
    
    
    -- 
    A very happy Hadoop contributor
    


---------------------------------------------------------------------
To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org

Re: [DISCUSSION] Create a branch to work on non-blocking access to HDFS

Reply via email to