Thanks for your advice. I will follow your instructions, and replace the
usage step by step.

陈明雨 <morning...@163.com> 于2022年3月29日周二 22:55写道:

> Indeed, we need to refactor the IO layer to make it more clear and
> extensible.
> The basic purpose is that when a new kind of file system is introduced, we
> only need to implement a new derived class
> for it and no need to modify any other interface in upper layer.
>
>
> BTW, for now, if we change the IO interface, it will impact lots of place.
> So how about do this for 2 steps:
>
>
> 1. Rewrite the IO stack in totally new files, and leave current implements
> along, for easy reviewing.
> 2. Use the new IO stack to replace current calls.
>
>
>
> --
>
> 此致!Best Regards
> 陈明雨 Mingyu Chen
>
> Email:
> chenmin...@apache.org
>
>
>
>
>
> At 2022-03-29 14:17:24, "GuoLei Yi" <yiguo...@gmail.com> wrote:
> >Currently, there are various interfaces for file IO operations in Doris:
> >
> >   - There are FileReader and FileWriter in the query layer. There are
> >   corresponding implementations for HDFS, S3, Broker, and Local.
> >   - In the storage layer, there is a BlockManager that abstracts Block,
> >   there are WriteableFileBlock, ReadableFileBlock.
> >   - For directory management work, there is an Env interface that can
> >   include directory operations, including RemoteEnv and PosixEnv, and
> there
> >   are also some link files and delete blocks in BlockManager; in
> addition,
> >   for S3, HDFS, there are operations such as S3StorageBackend that
> contain
> >   some file directories, including mkdir, copy , rm these operations
> >
> >So many ways to operate will  cause the following problems:
> >
> >   - It's messy, sometimes I don't know which one to use, many functions
> >   are repeated, but they have different abstract names;
> >   - Modifying a feature or fix a bug needs to be modified in multiple
> >   places. For example, if we want to read S3 and have a local cache,
> then all
> >   places need to be added;
> >
> >We need to unify the IO stack. In fact, access to IO can be roughly
> divided
> >into the following three types:
> >
> >   - Directory operations, create files, delete files, get file list, etc.
> >   - File write operation
> >   - File read operation
> >
> >And we could implement these API for different storage backends:
> >
> >
> >   - Local file
> >   - S3 file
> >   - HDFS file
> >   - Broker
> >
> >Once implemented, it can be used in the storage layer (separation of hot
> >and cold, separation of storage and computing), query layer (query S3,
> >query HDFS), backup and recovery, etc., to avoid repeated development and
> >maintenance
> >
> >--
> >Guolei Yi
> >Tel:134-3991-0228
> >Email:yiguo...@gmail.com
>


-- 
祝您心情愉快

衣国垒
Tsing Hua University
Tel:134-3991-0228
Email:yiguo...@gmail.com

Reply via email to