Re: Node roles vs SIP-20 Separation of Compute and Storage

Ilan Ginzburg Wed, 17 Jan 2024 06:52:07 -0800

With the SIP-20 proposal, there is a single copy of a shard on shared
storage, and any existing or future replica of that shard (leader or not
leader) accesses the same storage area.

With HdfsDirectory there’s a copy on shared storage for each replica.
During writes, the usual SolrCloud replication strategies between replicas
are used (writing to multiple replicas for durability) and when a node goes
down it needs to be restarted (or a replacement node restarted) to get the
latest state from the transaction log or the segments stored remotely
associated with that replica.

Note that the SolrCloud philosophy of maintaining multiple replicas for
durability and the S3 durability guarantees are overlapping (redundant)
when multiple replicas are individually persisted in S3. SIP-20 delegates
durability to the shared store.

Basically HdfsDirectory is a remote local node storage (is it redundant
with local remote storage such as AWS EBS?) and SIP-20 proposes a stronger
separation between compute and storage.

Another major difference is that with SIP-20, once the data has been
downloaded to a node from remote storage, is it present (cached) locally on
a local (ephemeral) disk. This allows faster access than the HdfsDirectory
implementation that uses memory caching (OS buffer cache I believe) but no
local disk caching.

SIP-20 enables more flexibility in how SolrCloud elasticity could be
implemented (scaling up and down, balancing load between nodes) as well as
node scalability (getting rid of local disk content for unused replicas,
reloading them when needed).

Ilan

On Wed 17 Jan 2024 at 14:59, Mike Drob <md...@mdrob.com> wrote:

> Don’t we already have implementation for shared storage backend using HDFS
> (and S3 transitively through the HDFS-S3 connectors)?
>
> On Wed, Jan 17, 2024 at 5:26 AM Ilan Ginzburg <ilans...@gmail.com> wrote:
>
> > HI,
> > Thanks for asking that question.
> >
> > The separation of compute and storage would be relevant for the nodes
> > having the "data" role, i.e. nodes that host indexes.
> >
> > SIP-20 offers a way for these indexes to be on shared storage (S3/GCS
> > etc) and not persisted long term on each individual node, making the
> > nodes themselves stateless (can lose all disk content as they restart
> > and everything will work ok).
> > Given roles coordinator and overseer do not require local state (local
> > persistent storage on the node local disk), SIP-20 makes all the nodes
> > stateless, the same way it does when no node roles are used (state is
> > then only maintained in ZooKeeper and the shared storage backend).
> >
> > If a specific assignment of node roles works for a given cluster/use
> > case, adopting SIP-20 in that cluster would change the storage of
> > indexes and the way each update is handled (distributed to multiple
> > replicas without SIP-20 or being processed by a single replica and
> > shared storage with SIP-20) but the roles would likely stay unchanged:
> > some nodes will be preferred for hosting the Overseer or for
> > coordinating queries, and the same subset of nodes will be handling
> > indexes (although in a different way).
> >
> > Hope that helps,
> > Ilan
> >
> >
> >
> >
> > On Tue, Jan 16, 2024 at 8:57 AM rajani m <rajinima...@gmail.com> wrote:
> > >
> > > Hi All,
> > >
> > >    Saw a post on the dev-mailing list about  SIP-20 Separation of
> Compute
> > > and Storage
> > > <
> >
> https://cwiki.apache.org/confluence/display/SOLR/SIP-20%3A+Separation+of+Compute+and+Storage+in+SolrCloud
> > >.
> > > Trying to understand what extra features it adds when compared to
> > > configuring a solrcloud cluster by leveraging node roles
> > > <
> >
> https://solr.apache.org/guide/solr/latest/deployment-guide/node-roles.html
> > >
> > > ?
> > >
> > > Thanks,
> > > Rajani
> >
>

Re: Node roles vs SIP-20 Separation of Compute and Storage

Reply via email to