Re:Re:Re: Refactor Doris's IO Stack

陈明雨 Wed, 27 Apr 2022 00:46:15 -0700

Add write priv for plat1ko

[1] 
https://cwiki.apache.org/confluence/display/DORIS/DSIP-006%3A+Refactor+IO+stack





--

此致！Best Regards
陈明雨 Mingyu Chen

Email:
[email protected]





在 2022-03-31 14:41:35，"陈明雨" <[email protected]> 写道：
>Hi Guolei,
>I have created DSIP-006 for this proposal
>https://cwiki.apache.org/confluence/display/DORIS/DSIP-006%3A+Refactor+IO+stack
>
>
>
>
>--
>
>此致！Best Regards
>陈明雨 Mingyu Chen
>
>Email:
>[email protected]
>
>
>
>
>
>在 2022-03-30 12:35:44，"王博" <[email protected]> 写道：
>>+1
>>Looking forward Teacher Guolei's dsip.
>>
>>GuoLei Yi <[email protected]> 于2022年3月29日周二 14:17写道：
>>
>>> Currently, there are various interfaces for file IO operations in Doris:
>>>
>>>    - There are FileReader and FileWriter in the query layer. There are
>>>    corresponding implementations for HDFS, S3, Broker, and Local.
>>>    - In the storage layer, there is a BlockManager that abstracts Block,
>>>    there are WriteableFileBlock, ReadableFileBlock.
>>>    - For directory management work, there is an Env interface that can
>>>    include directory operations, including RemoteEnv and PosixEnv, and
>>> there
>>>    are also some link files and delete blocks in BlockManager; in addition,
>>>    for S3, HDFS, there are operations such as S3StorageBackend that contain
>>>    some file directories, including mkdir, copy , rm these operations
>>>
>>> So many ways to operate will  cause the following problems:
>>>
>>>    - It's messy, sometimes I don't know which one to use, many functions
>>>    are repeated, but they have different abstract names;
>>>    - Modifying a feature or fix a bug needs to be modified in multiple
>>>    places. For example, if we want to read S3 and have a local cache, then
>>> all
>>>    places need to be added;
>>>
>>> We need to unify the IO stack. In fact, access to IO can be roughly divided
>>> into the following three types:
>>>
>>>    - Directory operations, create files, delete files, get file list, etc.
>>>    - File write operation
>>>    - File read operation
>>>
>>> And we could implement these API for different storage backends:
>>>
>>>
>>>    - Local file
>>>    - S3 file
>>>    - HDFS file
>>>    - Broker
>>>
>>> Once implemented, it can be used in the storage layer (separation of hot
>>> and cold, separation of storage and computing), query layer (query S3,
>>> query HDFS), backup and recovery, etc., to avoid repeated development and
>>> maintenance
>>>
>>> --
>>> Guolei Yi
>>> Tel:134-3991-0228
>>> Email:[email protected]
>>>
>>
>>
>>-- 
>>王博  Wang Bo

Re:Re:Re: Refactor Doris's IO Stack

Reply via email to