+1
Looking forward Teacher Guolei's dsip.

GuoLei Yi <yiguo...@gmail.com> 于2022年3月29日周二 14:17写道:

> Currently, there are various interfaces for file IO operations in Doris:
>
>    - There are FileReader and FileWriter in the query layer. There are
>    corresponding implementations for HDFS, S3, Broker, and Local.
>    - In the storage layer, there is a BlockManager that abstracts Block,
>    there are WriteableFileBlock, ReadableFileBlock.
>    - For directory management work, there is an Env interface that can
>    include directory operations, including RemoteEnv and PosixEnv, and
> there
>    are also some link files and delete blocks in BlockManager; in addition,
>    for S3, HDFS, there are operations such as S3StorageBackend that contain
>    some file directories, including mkdir, copy , rm these operations
>
> So many ways to operate will  cause the following problems:
>
>    - It's messy, sometimes I don't know which one to use, many functions
>    are repeated, but they have different abstract names;
>    - Modifying a feature or fix a bug needs to be modified in multiple
>    places. For example, if we want to read S3 and have a local cache, then
> all
>    places need to be added;
>
> We need to unify the IO stack. In fact, access to IO can be roughly divided
> into the following three types:
>
>    - Directory operations, create files, delete files, get file list, etc.
>    - File write operation
>    - File read operation
>
> And we could implement these API for different storage backends:
>
>
>    - Local file
>    - S3 file
>    - HDFS file
>    - Broker
>
> Once implemented, it can be used in the storage layer (separation of hot
> and cold, separation of storage and computing), query layer (query S3,
> query HDFS), backup and recovery, etc., to avoid repeated development and
> maintenance
>
> --
> Guolei Yi
> Tel:134-3991-0228
> Email:yiguo...@gmail.com
>


-- 
王博  Wang Bo

Reply via email to