Indeed, we need to refactor the IO layer to make it more clear and extensible.
The basic purpose is that when a new kind of file system is introduced, we only 
need to implement a new derived class
for it and no need to modify any other interface in upper layer.


BTW, for now, if we change the IO interface, it will impact lots of place.
So how about do this for 2 steps:


1. Rewrite the IO stack in totally new files, and leave current implements 
along, for easy reviewing.
2. Use the new IO stack to replace current calls.



--

此致!Best Regards
陈明雨 Mingyu Chen

Email:
chenmin...@apache.org





At 2022-03-29 14:17:24, "GuoLei Yi" <yiguo...@gmail.com> wrote:
>Currently, there are various interfaces for file IO operations in Doris:
>
>   - There are FileReader and FileWriter in the query layer. There are
>   corresponding implementations for HDFS, S3, Broker, and Local.
>   - In the storage layer, there is a BlockManager that abstracts Block,
>   there are WriteableFileBlock, ReadableFileBlock.
>   - For directory management work, there is an Env interface that can
>   include directory operations, including RemoteEnv and PosixEnv, and there
>   are also some link files and delete blocks in BlockManager; in addition,
>   for S3, HDFS, there are operations such as S3StorageBackend that contain
>   some file directories, including mkdir, copy , rm these operations
>
>So many ways to operate will  cause the following problems:
>
>   - It's messy, sometimes I don't know which one to use, many functions
>   are repeated, but they have different abstract names;
>   - Modifying a feature or fix a bug needs to be modified in multiple
>   places. For example, if we want to read S3 and have a local cache, then all
>   places need to be added;
>
>We need to unify the IO stack. In fact, access to IO can be roughly divided
>into the following three types:
>
>   - Directory operations, create files, delete files, get file list, etc.
>   - File write operation
>   - File read operation
>
>And we could implement these API for different storage backends:
>
>
>   - Local file
>   - S3 file
>   - HDFS file
>   - Broker
>
>Once implemented, it can be used in the storage layer (separation of hot
>and cold, separation of storage and computing), query layer (query S3,
>query HDFS), backup and recovery, etc., to avoid repeated development and
>maintenance
>
>-- 
>Guolei Yi
>Tel:134-3991-0228
>Email:yiguo...@gmail.com

Reply via email to