Thanks for your advice. I will follow your instructions, and replace the usage step by step.
陈明雨 <morning...@163.com> 于2022年3月29日周二 22:55写道: > Indeed, we need to refactor the IO layer to make it more clear and > extensible. > The basic purpose is that when a new kind of file system is introduced, we > only need to implement a new derived class > for it and no need to modify any other interface in upper layer. > > > BTW, for now, if we change the IO interface, it will impact lots of place. > So how about do this for 2 steps: > > > 1. Rewrite the IO stack in totally new files, and leave current implements > along, for easy reviewing. > 2. Use the new IO stack to replace current calls. > > > > -- > > 此致!Best Regards > 陈明雨 Mingyu Chen > > Email: > chenmin...@apache.org > > > > > > At 2022-03-29 14:17:24, "GuoLei Yi" <yiguo...@gmail.com> wrote: > >Currently, there are various interfaces for file IO operations in Doris: > > > > - There are FileReader and FileWriter in the query layer. There are > > corresponding implementations for HDFS, S3, Broker, and Local. > > - In the storage layer, there is a BlockManager that abstracts Block, > > there are WriteableFileBlock, ReadableFileBlock. > > - For directory management work, there is an Env interface that can > > include directory operations, including RemoteEnv and PosixEnv, and > there > > are also some link files and delete blocks in BlockManager; in > addition, > > for S3, HDFS, there are operations such as S3StorageBackend that > contain > > some file directories, including mkdir, copy , rm these operations > > > >So many ways to operate will cause the following problems: > > > > - It's messy, sometimes I don't know which one to use, many functions > > are repeated, but they have different abstract names; > > - Modifying a feature or fix a bug needs to be modified in multiple > > places. For example, if we want to read S3 and have a local cache, > then all > > places need to be added; > > > >We need to unify the IO stack. In fact, access to IO can be roughly > divided > >into the following three types: > > > > - Directory operations, create files, delete files, get file list, etc. > > - File write operation > > - File read operation > > > >And we could implement these API for different storage backends: > > > > > > - Local file > > - S3 file > > - HDFS file > > - Broker > > > >Once implemented, it can be used in the storage layer (separation of hot > >and cold, separation of storage and computing), query layer (query S3, > >query HDFS), backup and recovery, etc., to avoid repeated development and > >maintenance > > > >-- > >Guolei Yi > >Tel:134-3991-0228 > >Email:yiguo...@gmail.com > -- 祝您心情愉快 衣国垒 Tsing Hua University Tel:134-3991-0228 Email:yiguo...@gmail.com