+1, this will be a great addition.

On Apr 3, 2024 at 09:44:01, Wail Alkowaileet <wael....@gmail.com> wrote:

> In the current cloud deployment, users are limited by the disk space of the
> cluster's nodes. However, the blob storage services provided by cloud
> providers (e.g., S3) can virtually store an "unlimited" amount of data.
> Thus, AsterixDB can provide the means to store beyond what the cluster's
> local drives can.
>
> In this proposal, we want to extend AsterixDB's capability to allow the
> local drives to act as a cache, instead of a mirror image of what's stored
> in the cloud. By "as a cache" we mean files and pages can be
> retrieved/persited and removed (evicted) from the local drives, according
> to some policy.
>
> The aim of this proposal is to describe and implement a mechanism called
> "*Weep
> and Sweep*". Those are the names of two phases when the amount of the data
> in the cloud exceeds the space of the cluster's local disks.
> Weep
>
> When the disk is pressured (the pressure size can be configured), the
> system will start to "weep" and devise a plan to what should be "evicted"
> according to some statistics and policies, *which are not solidified yet
> and still a work in progress.*
> Sweep
>
> After "weeping", a sweep operation will take place and start evicting what
> the weep's plan considers as evictable. Depending on the index type
> (primary/secondary) and the storage format (row/column), the smallest
> evictable unit can differ. The following table shows the smallest unit of
> evictable unit:
> *Index Type* *Evictable*
> Metadata Indexes (e.g., Dataset, ..etc) Not evictable
> Secondary indexes Evicted as a whole
> Primary Indexes (Row) Evicted as a whole
> Primary Indexes (Columnar) Columns (or columns’ pages)
> Featured Considerations
>
>   - For columnar primary index, they will never be downloaded as a whole
>      - Instead, columns will be streamed from the cloud (if accessed for
>      the first time) and persisted to local disk if necessary
>   - We are considering providing a mechanism to prefetch the next columns
>   of the next mega-leaf node
>   <
> https://urldefense.com/v3/__https://www.vldb.org/pvldb/vol15/p2085-alkowaileet.pdf__;!!CzAuKJ42GuquVTTmVmPViYEvSg!Oah7iQPtzg5ozE3ckKpn-ANVgu_VrdWY_2gO_-HwxeYgrKWj8kmv7ifZQKnf36jne2V_SXXvmITxy_E$
> >. The hope here
>   is to mask any latencies when reading columns from the cloud
>   - Depending on the disk pressure and the operation, the system can
>   determine if the streamed columns from the cloud are "worthy" to be
> cached
>   locally. For example, if columns are read in a merge operation, it might
>   not be "wise" to persist these columns as their on-disk component is
> going
>   to be deleted at the end of the merge operation. Thus, it might be
> "better"
>   to dedicate the free space on disk for the newly created/merged
> component.
>
>
> Multiple aspects (such as the evictable units and policies) of this APE are
> not solidified yet, but the core concepts are in place and are ready for
> the community's vote :)
>
> EPIC: ASTERIXDB-3373 <
> https://urldefense.com/v3/__https://issues.apache.org/jira/browse/ASTERIXDB-3373__;!!CzAuKJ42GuquVTTmVmPViYEvSg!Oah7iQPtzg5ozE3ckKpn-ANVgu_VrdWY_2gO_-HwxeYgrKWj8kmv7ifZQKnf36jne2V_SXXv8xZvKPI$
> >
> --
>
> *Regards,*
> Wail Alkowaileet
>

Reply via email to