HappenLee opened a new issue #3926:
URL: https://github.com/apache/incubator-doris/issues/3926


   ## Motivation
   At present, the use of Doris often encounters the limitation bottleneck of 
```mem limit```, which leads to many queries can not be completed.
   
   Although we can solve this problem by adjusting the ```mem_limit``` of 
query. But in some memory bottleneck scenarios, this is futile.
   
   The capacity of the disk is usually about 100 times of the memory, if we can 
spill the data beyond the memory limit to the disk. This almost solves the 
above problem perfectly, but the speed of disk is much slower than that of 
memory, which will also lead to long execution time of query.
   
   ### It can bring us the following benefits:
   
   1. In some memory tight scenarios, more memory is available at the expense 
of query execution time. This is necessary in some scenarios
   
   2. Doris can dispose larger query without memory constraints
   
   
   ## Implementation
   
   1. Now, The ```BufferedBlockMgr2``` and ```DiskIOMgr``` have already 
supported to spill mem data to disk. We need to use these functions to writes 
data to a temporary work area on disk. The default location of this work area 
is ```doris-scratch```,  when an operation completes, the data is removed from 
the disk.
   
   2. There are 3 version of ```BufferedTupleStream``` which make us confuse. 
We need to unify the abstraction of this important part to do a good job for 
spilling to disk.
   
   3. Successively implement the disk dropping function of the following 
execution nodes:
           
       * Sort
       * Aggregation
       * Analytic function
       * Join 
   
   4. Remove redundant code, such as ```BufferTupleStream```, ```HashTable``` 
and so on.
   
   5.  Some optimization of spilling to disk:
        * Size limit of temporary file
        * Limit of IO speed of spilling to disk
        * Using the IO capability of SSD
        * Compression and decompression of spilling data


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[email protected]



---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to