Re: [VOTE][APE] Compute-Storage Separation (Cloud Mode Deployment)

Wail Alkowaileet Sat, 02 Dec 2023 12:23:35 -0800

+1

On Sat, Dec 2, 2023 at 11:25 Till Westmann <ti...@apache.org> wrote:


> +1
>
> > On Dec 2, 2023, at 11:23, Glenn Justo Galvizo <ggalv...@uci.edu> wrote:
> >
> > +1 from me as well.
> >
> >> On Dec 2, 2023, at 10:27, Ian Maxon <ima...@apache.org> wrote:
> >>
> >> +1
> >>
> >>>> On Dec 1, 2023 at 12:28:23, Murtadha Al-Hubail <hubail...@gmail.com>
> wrote:
> >>>
> >>> Each AsterixDB cluster today consists of one or more Node Controllers
> (NC)
> >>> where the data is stored and processed. Each NC has a predefined set of
> >>> storage partitions (iodevices). When data is ingested into the system,
> the
> >>> data is hash-partitioned across the total number of storage partitions
> in
> >>> the cluster. Similarly, when the data is queried, each NC will start as
> >>> many threads as the number storage partitions it has to read and
> process
> >>> the data in parallel. While this shared-nothing architecture has its
> >>> advantages, it has its drawbacks too. One major drawback is the time
> needed
> >>> to scale the cluster. Adding a new NC to an existing cluster of (n)
> nodes
> >>> means writing a completely new copy of the data which will now be
> >>> hash-partitioned to the new total number of storage partitions of (n +
> 1)
> >>> nodes. This operation could potentially take several hours or even days
> >>> which is unacceptable in the cloud age.
> >>>
> >>> This APE is about adding a new deployment (cloud) mode to AsterixDB by
> >>> implementing compute-storage separation to take advantage of the
> elasticity
> >>> of the cloud. This will require the following:
> >>>
> >>> 1. Moving from the dynamic data partitioning described earlier to a
> static
> >>> data partitioning based on a configurable, but fixed during a cluster's
> >>> life, number of storage partitions.
> >>> 2. Introducing the concept of a "compute partition" where each NC will
> have
> >>> a fixed number of compute partitions. This number could potentially be
> >>> based on the number of CPU cores it has.
> >>>
> >>> This will decouple the number of storage partitions being processed on
> an
> >>> NC from the number of its compute partitions.
> >>>
> >>> When an AsterixDB cluster is deployed using the cloud mode, we will do
> the
> >>> following:
> >>>
> >>> - The Cluster Controller will maintain a map containing the assignment
> of
> >>> storage partitions to compute partitions.
> >>> - New writes will be written to the NC's local storage and uploaded to
> an
> >>> object store (e.g. AWS S3) which will be used as a highly available
> shared
> >>> filesystem between NCs.
> >>> - On queries, each NC will start as many threads as its compute
> partitions
> >>> to process its currently assigned storage partitions.
> >>> - On scaling operations, we will simply update the assignment map and
> NCs
> >>> will lazily cache any data of newly assigned storage partitions from
> the
> >>> object store.
> >>>
> >>> Improvement tickets:
> >>> Static data partitioning:
> >>>
> https://urldefense.com/v3/__https://issues.apache.org/jira/browse/ASTERIXDB-3144__;!!CzAuKJ42GuquVTTmVmPViYEvSg!OAhVXrR7KC09sldpj5RPLxWAUgdr8MVlQ9bIpT5QK76KPmMlxnjFGChosdZpBbe81Z_KZI7COEEXdi5a$
> >>> Compute-Storage Separation
> >>>
> https://urldefense.com/v3/__https://issues.apache.org/jira/browse/ASTERIXDB-3196__;!!CzAuKJ42GuquVTTmVmPViYEvSg!OAhVXrR7KC09sldpj5RPLxWAUgdr8MVlQ9bIpT5QK76KPmMlxnjFGChosdZpBbe81Z_KZI7COGLN6MWp$
> >>>
> >>> Please vote on this APE. We'll keep this open for 72 hours and pass
> with
> >>> either 3 votes or a majority of positive votes.
> >>>
>

Re: [VOTE][APE] Compute-Storage Separation (Cloud Mode Deployment)

Reply via email to