+1 from me too.  This will provide a nice hybrid between the current shared
nothing architecture and shared disk like data accessibility.

On Sat, Dec 2, 2023 at 12:24 PM Wail Alkowaileet <[email protected]> wrote:

> +1
>
> On Sat, Dec 2, 2023 at 11:25 Till Westmann <[email protected]> wrote:
>
> > +1
> >
> > > On Dec 2, 2023, at 11:23, Glenn Justo Galvizo <[email protected]>
> wrote:
> > >
> > > +1 from me as well.
> > >
> > >> On Dec 2, 2023, at 10:27, Ian Maxon <[email protected]> wrote:
> > >>
> > >> +1
> > >>
> > >>>> On Dec 1, 2023 at 12:28:23, Murtadha Al-Hubail <[email protected]
> >
> > wrote:
> > >>>
> > >>> Each AsterixDB cluster today consists of one or more Node Controllers
> > (NC)
> > >>> where the data is stored and processed. Each NC has a predefined set
> of
> > >>> storage partitions (iodevices). When data is ingested into the
> system,
> > the
> > >>> data is hash-partitioned across the total number of storage
> partitions
> > in
> > >>> the cluster. Similarly, when the data is queried, each NC will start
> as
> > >>> many threads as the number storage partitions it has to read and
> > process
> > >>> the data in parallel. While this shared-nothing architecture has its
> > >>> advantages, it has its drawbacks too. One major drawback is the time
> > needed
> > >>> to scale the cluster. Adding a new NC to an existing cluster of (n)
> > nodes
> > >>> means writing a completely new copy of the data which will now be
> > >>> hash-partitioned to the new total number of storage partitions of (n
> +
> > 1)
> > >>> nodes. This operation could potentially take several hours or even
> days
> > >>> which is unacceptable in the cloud age.
> > >>>
> > >>> This APE is about adding a new deployment (cloud) mode to AsterixDB
> by
> > >>> implementing compute-storage separation to take advantage of the
> > elasticity
> > >>> of the cloud. This will require the following:
> > >>>
> > >>> 1. Moving from the dynamic data partitioning described earlier to a
> > static
> > >>> data partitioning based on a configurable, but fixed during a
> cluster's
> > >>> life, number of storage partitions.
> > >>> 2. Introducing the concept of a "compute partition" where each NC
> will
> > have
> > >>> a fixed number of compute partitions. This number could potentially
> be
> > >>> based on the number of CPU cores it has.
> > >>>
> > >>> This will decouple the number of storage partitions being processed
> on
> > an
> > >>> NC from the number of its compute partitions.
> > >>>
> > >>> When an AsterixDB cluster is deployed using the cloud mode, we will
> do
> > the
> > >>> following:
> > >>>
> > >>> - The Cluster Controller will maintain a map containing the
> assignment
> > of
> > >>> storage partitions to compute partitions.
> > >>> - New writes will be written to the NC's local storage and uploaded
> to
> > an
> > >>> object store (e.g. AWS S3) which will be used as a highly available
> > shared
> > >>> filesystem between NCs.
> > >>> - On queries, each NC will start as many threads as its compute
> > partitions
> > >>> to process its currently assigned storage partitions.
> > >>> - On scaling operations, we will simply update the assignment map and
> > NCs
> > >>> will lazily cache any data of newly assigned storage partitions from
> > the
> > >>> object store.
> > >>>
> > >>> Improvement tickets:
> > >>> Static data partitioning:
> > >>>
> >
> https://urldefense.com/v3/__https://issues.apache.org/jira/browse/ASTERIXDB-3144__;!!CzAuKJ42GuquVTTmVmPViYEvSg!OAhVXrR7KC09sldpj5RPLxWAUgdr8MVlQ9bIpT5QK76KPmMlxnjFGChosdZpBbe81Z_KZI7COEEXdi5a$
> > >>> Compute-Storage Separation
> > >>>
> >
> https://urldefense.com/v3/__https://issues.apache.org/jira/browse/ASTERIXDB-3196__;!!CzAuKJ42GuquVTTmVmPViYEvSg!OAhVXrR7KC09sldpj5RPLxWAUgdr8MVlQ9bIpT5QK76KPmMlxnjFGChosdZpBbe81Z_KZI7COGLN6MWp$
> > >>>
> > >>> Please vote on this APE. We'll keep this open for 72 hours and pass
> > with
> > >>> either 3 votes or a majority of positive votes.
> > >>>
> >
>

Reply via email to