Nice proposal. This issue deserves to be discussed!
-- 此致!Best Regards 陈明雨 Mingyu Chen Email: chenmin...@apache.org At 2020-12-30 14:55:07, "lichaoyong" <lichaoy...@apache.org> wrote: >Array is used to store data like user labels. >It made up of (offset, element), element can be an Array. >[1, 2, 3], [4, 5, 6] >Doris has implemented array nest type. But it's not extensible. > 1. It stores NULL with offset, which only is feasible to Array, but not to >Struct. > Struct is made of {element 1, element2}, but does not have an implicit >offset column. > Array should store null as a new column, not attached to an offset >column. > 2. It stores offset with absolute ordinal, so every offset ColumnWriter >will have to > callback ArrayColumnWriter to record the position. The logic is so >tricky to understand. > 3. When reading the ArrayColumn, it has to seek the next position to get >the right end offset. > >The array will support the **array_slice, flatten, a[5]** function. >This function should eliminate the memory copy in Array. >So the memory layout will store the data in the bottom level. >Every function will return a new Array and only change set the offsets. >``` >The memory layout: > >Two-level data : [[1, 2], [3, 4]], [[5, 6, 7], null, [8]], [[9, 10]] > >* First Level Offsets (int32), First Level have no nulls > > | Bytes 0-3 | Bytes 4-7 | Bytes 8-11 | Bytes 12-15 | > |------------|------------|------------|-------------| > | 0 | 2 | 5 | 6 | > > * Second Nulls (uint8) > > | Byte 1- 3 | Bytes 4 | Bytes 5 - 7| > |-----------------------------------| > | 0 | 1 | 0 | > > * Second Level Offsets (int32) > > | Bytes 0-27 | > |----------------------| > | 0, 2, 4, 7, 7, 8, 10 | > > * Elements array (int32): > > | Bytes 0-9 | > |-------------------------------| > | 1, 2, 3, 4, 5, 6, 7, 8, 9, 10 | >``` >Like the above, every level in the array only stores nulls and offsets. >And also, offsets will store zero to calculate the first size. > >The disk layout should take encoding algorithms and reading efficiency into >consideration. >So the disk layout will store the array_size without the absolute for every >array element. >``` >The disk layout >* First Level array_sizes (int32), First Level have no nulls > > | Bytes 0-3 | Bytes 4-7 | Bytes 8-11 | > |------------|------------|------------| > | 2 | 3 | 1 | > > * Second Nulls (uint8) > > | Byte 1- 3 | Bytes 4 | Bytes 5 - 7| > |-----------------------------------| > | 0 | 1 | 0 | > > * Second Level array_sizes (int32) > > | Bytes 0-23 | > |----------------------| > | 2, 2, 3, 0, 1, 2 | > > * Elements array (int32): > > | Bytes 0-9 | > |-------------------------------| > | 1, 2, 3, 4, 5, 6, 7, 8, 9, 10 | >``` >For null, array_size, element array, it will construct the specified >ColumnWriter to write the data. >The ArrayColumnWriter only needs to call the three writers. >``` >1. Call the writer to write nulls. >2. Call the writer to write array_sizes. And also add a new meta to record >the corresponding relation between array_size ordinal > and element ordinal. >3. Call element writer recursively. >``` > >Upon read, when seeking to specify the ordinal, it will seek the null, >array_size, element separately. >When seeking the element column, it will get the start ordinal from >array_size reader. >Because the array_size has to seek the specified ordinal, so It only needs >one sum. > >Asides from the above read and write logic, I will refactor the TypeInfo >class to support nested types(Array/Struct/Map). >I divided into two function >``` >using TypeInfoPtr = std::shared_ptr<TypeInfo> >const TypeInfoPtr& get_type_info(FieldType type) >TypeInfoPtr get_type_info(const TabletColumn& column); >``` >The first interface is used to scalar type, and the second interface is >used to all types including nested types(Array/Struct/Map). >To alleviate the copy of shared_ptr, the scalar get_type_info interface >will return a const reference of TypeInfo.