Hi Guolei,
I have created DSIP-006 for this proposal
https://cwiki.apache.org/confluence/display/DORIS/DSIP-006%3A+Refactor+IO+stack




--

此致!Best Regards
陈明雨 Mingyu Chen

Email:
chenmin...@apache.org





在 2022-03-30 12:35:44,"王博" <wangbo13...@gmail.com> 写道:
>+1
>Looking forward Teacher Guolei's dsip.
>
>GuoLei Yi <yiguo...@gmail.com> 于2022年3月29日周二 14:17写道:
>
>> Currently, there are various interfaces for file IO operations in Doris:
>>
>>    - There are FileReader and FileWriter in the query layer. There are
>>    corresponding implementations for HDFS, S3, Broker, and Local.
>>    - In the storage layer, there is a BlockManager that abstracts Block,
>>    there are WriteableFileBlock, ReadableFileBlock.
>>    - For directory management work, there is an Env interface that can
>>    include directory operations, including RemoteEnv and PosixEnv, and
>> there
>>    are also some link files and delete blocks in BlockManager; in addition,
>>    for S3, HDFS, there are operations such as S3StorageBackend that contain
>>    some file directories, including mkdir, copy , rm these operations
>>
>> So many ways to operate will  cause the following problems:
>>
>>    - It's messy, sometimes I don't know which one to use, many functions
>>    are repeated, but they have different abstract names;
>>    - Modifying a feature or fix a bug needs to be modified in multiple
>>    places. For example, if we want to read S3 and have a local cache, then
>> all
>>    places need to be added;
>>
>> We need to unify the IO stack. In fact, access to IO can be roughly divided
>> into the following three types:
>>
>>    - Directory operations, create files, delete files, get file list, etc.
>>    - File write operation
>>    - File read operation
>>
>> And we could implement these API for different storage backends:
>>
>>
>>    - Local file
>>    - S3 file
>>    - HDFS file
>>    - Broker
>>
>> Once implemented, it can be used in the storage layer (separation of hot
>> and cold, separation of storage and computing), query layer (query S3,
>> query HDFS), backup and recovery, etc., to avoid repeated development and
>> maintenance
>>
>> --
>> Guolei Yi
>> Tel:134-3991-0228
>> Email:yiguo...@gmail.com
>>
>
>
>-- 
>王博  Wang Bo

Reply via email to