Don’t we already have implementation for shared storage backend using HDFS (and S3 transitively through the HDFS-S3 connectors)?
On Wed, Jan 17, 2024 at 5:26 AM Ilan Ginzburg <ilans...@gmail.com> wrote: > HI, > Thanks for asking that question. > > The separation of compute and storage would be relevant for the nodes > having the "data" role, i.e. nodes that host indexes. > > SIP-20 offers a way for these indexes to be on shared storage (S3/GCS > etc) and not persisted long term on each individual node, making the > nodes themselves stateless (can lose all disk content as they restart > and everything will work ok). > Given roles coordinator and overseer do not require local state (local > persistent storage on the node local disk), SIP-20 makes all the nodes > stateless, the same way it does when no node roles are used (state is > then only maintained in ZooKeeper and the shared storage backend). > > If a specific assignment of node roles works for a given cluster/use > case, adopting SIP-20 in that cluster would change the storage of > indexes and the way each update is handled (distributed to multiple > replicas without SIP-20 or being processed by a single replica and > shared storage with SIP-20) but the roles would likely stay unchanged: > some nodes will be preferred for hosting the Overseer or for > coordinating queries, and the same subset of nodes will be handling > indexes (although in a different way). > > Hope that helps, > Ilan > > > > > On Tue, Jan 16, 2024 at 8:57 AM rajani m <rajinima...@gmail.com> wrote: > > > > Hi All, > > > > Saw a post on the dev-mailing list about SIP-20 Separation of Compute > > and Storage > > < > https://cwiki.apache.org/confluence/display/SOLR/SIP-20%3A+Separation+of+Compute+and+Storage+in+SolrCloud > >. > > Trying to understand what extra features it adds when compared to > > configuring a solrcloud cluster by leveraging node roles > > < > https://solr.apache.org/guide/solr/latest/deployment-guide/node-roles.html > > > > ? > > > > Thanks, > > Rajani >