+1, this will be a great addition. On Apr 3, 2024 at 09:44:01, Wail Alkowaileet <wael....@gmail.com> wrote:
> In the current cloud deployment, users are limited by the disk space of the > cluster's nodes. However, the blob storage services provided by cloud > providers (e.g., S3) can virtually store an "unlimited" amount of data. > Thus, AsterixDB can provide the means to store beyond what the cluster's > local drives can. > > In this proposal, we want to extend AsterixDB's capability to allow the > local drives to act as a cache, instead of a mirror image of what's stored > in the cloud. By "as a cache" we mean files and pages can be > retrieved/persited and removed (evicted) from the local drives, according > to some policy. > > The aim of this proposal is to describe and implement a mechanism called > "*Weep > and Sweep*". Those are the names of two phases when the amount of the data > in the cloud exceeds the space of the cluster's local disks. > Weep > > When the disk is pressured (the pressure size can be configured), the > system will start to "weep" and devise a plan to what should be "evicted" > according to some statistics and policies, *which are not solidified yet > and still a work in progress.* > Sweep > > After "weeping", a sweep operation will take place and start evicting what > the weep's plan considers as evictable. Depending on the index type > (primary/secondary) and the storage format (row/column), the smallest > evictable unit can differ. The following table shows the smallest unit of > evictable unit: > *Index Type* *Evictable* > Metadata Indexes (e.g., Dataset, ..etc) Not evictable > Secondary indexes Evicted as a whole > Primary Indexes (Row) Evicted as a whole > Primary Indexes (Columnar) Columns (or columns’ pages) > Featured Considerations > > - For columnar primary index, they will never be downloaded as a whole > - Instead, columns will be streamed from the cloud (if accessed for > the first time) and persisted to local disk if necessary > - We are considering providing a mechanism to prefetch the next columns > of the next mega-leaf node > < > https://urldefense.com/v3/__https://www.vldb.org/pvldb/vol15/p2085-alkowaileet.pdf__;!!CzAuKJ42GuquVTTmVmPViYEvSg!Oah7iQPtzg5ozE3ckKpn-ANVgu_VrdWY_2gO_-HwxeYgrKWj8kmv7ifZQKnf36jne2V_SXXvmITxy_E$ > >. The hope here > is to mask any latencies when reading columns from the cloud > - Depending on the disk pressure and the operation, the system can > determine if the streamed columns from the cloud are "worthy" to be > cached > locally. For example, if columns are read in a merge operation, it might > not be "wise" to persist these columns as their on-disk component is > going > to be deleted at the end of the merge operation. Thus, it might be > "better" > to dedicate the free space on disk for the newly created/merged > component. > > > Multiple aspects (such as the evictable units and policies) of this APE are > not solidified yet, but the core concepts are in place and are ready for > the community's vote :) > > EPIC: ASTERIXDB-3373 < > https://urldefense.com/v3/__https://issues.apache.org/jira/browse/ASTERIXDB-3373__;!!CzAuKJ42GuquVTTmVmPViYEvSg!Oah7iQPtzg5ozE3ckKpn-ANVgu_VrdWY_2gO_-HwxeYgrKWj8kmv7ifZQKnf36jne2V_SXXv8xZvKPI$ > > > -- > > *Regards,* > Wail Alkowaileet >