+1 Looking forward Teacher Guolei's dsip. GuoLei Yi <yiguo...@gmail.com> 于2022年3月29日周二 14:17写道:
> Currently, there are various interfaces for file IO operations in Doris: > > - There are FileReader and FileWriter in the query layer. There are > corresponding implementations for HDFS, S3, Broker, and Local. > - In the storage layer, there is a BlockManager that abstracts Block, > there are WriteableFileBlock, ReadableFileBlock. > - For directory management work, there is an Env interface that can > include directory operations, including RemoteEnv and PosixEnv, and > there > are also some link files and delete blocks in BlockManager; in addition, > for S3, HDFS, there are operations such as S3StorageBackend that contain > some file directories, including mkdir, copy , rm these operations > > So many ways to operate will cause the following problems: > > - It's messy, sometimes I don't know which one to use, many functions > are repeated, but they have different abstract names; > - Modifying a feature or fix a bug needs to be modified in multiple > places. For example, if we want to read S3 and have a local cache, then > all > places need to be added; > > We need to unify the IO stack. In fact, access to IO can be roughly divided > into the following three types: > > - Directory operations, create files, delete files, get file list, etc. > - File write operation > - File read operation > > And we could implement these API for different storage backends: > > > - Local file > - S3 file > - HDFS file > - Broker > > Once implemented, it can be used in the storage layer (separation of hot > and cold, separation of storage and computing), query layer (query S3, > query HDFS), backup and recovery, etc., to avoid repeated development and > maintenance > > -- > Guolei Yi > Tel:134-3991-0228 > Email:yiguo...@gmail.com > -- 王博 Wang Bo