+1 for this subproject!

On Wed, Apr 3, 2024 at 9:44 AM Wail Alkowaileet <wael....@gmail.com> wrote:

> In the current cloud deployment, users are limited by the disk space of the
> cluster's nodes. However, the blob storage services provided by cloud
> providers (e.g., S3) can virtually store an "unlimited" amount of data.
> Thus, AsterixDB can provide the means to store beyond what the cluster's
> local drives can.
>
> In this proposal, we want to extend AsterixDB's capability to allow the
> local drives to act as a cache, instead of a mirror image of what's stored
> in the cloud. By "as a cache" we mean files and pages can be
> retrieved/persited and removed (evicted) from the local drives, according
> to some policy.
>
> The aim of this proposal is to describe and implement a mechanism called
> "*Weep
> and Sweep*". Those are the names of two phases when the amount of the data
> in the cloud exceeds the space of the cluster's local disks.
> Weep
>
> When the disk is pressured (the pressure size can be configured), the
> system will start to "weep" and devise a plan to what should be "evicted"
> according to some statistics and policies, *which are not solidified yet
> and still a work in progress.*
> Sweep
>
> After "weeping", a sweep operation will take place and start evicting what
> the weep's plan considers as evictable. Depending on the index type
> (primary/secondary) and the storage format (row/column), the smallest
> evictable unit can differ. The following table shows the smallest unit of
> evictable unit:
> *Index Type* *Evictable*
> Metadata Indexes (e.g., Dataset, ..etc) Not evictable
> Secondary indexes Evicted as a whole
> Primary Indexes (Row) Evicted as a whole
> Primary Indexes (Columnar) Columns (or columns’ pages)
> Featured Considerations
>
>    - For columnar primary index, they will never be downloaded as a whole
>       - Instead, columns will be streamed from the cloud (if accessed for
>       the first time) and persisted to local disk if necessary
>    - We are considering providing a mechanism to prefetch the next columns
>    of the next mega-leaf node
>    <https://www.vldb.org/pvldb/vol15/p2085-alkowaileet.pdf>. The hope here
>    is to mask any latencies when reading columns from the cloud
>    - Depending on the disk pressure and the operation, the system can
>    determine if the streamed columns from the cloud are "worthy" to be
> cached
>    locally. For example, if columns are read in a merge operation, it might
>    not be "wise" to persist these columns as their on-disk component is
> going
>    to be deleted at the end of the merge operation. Thus, it might be
> "better"
>    to dedicate the free space on disk for the newly created/merged
> component.
>
>
> Multiple aspects (such as the evictable units and policies) of this APE are
> not solidified yet, but the core concepts are in place and are ready for
> the community's vote :)
>
> EPIC: ASTERIXDB-3373 <https://issues.apache.org/jira/browse/ASTERIXDB-3373
> >
> --
>
> *Regards,*
> Wail Alkowaileet
>

Reply via email to