Re: Separation of Compute and Storage in SolrCloud

Walter Underwood Sat, 29 Nov 2025 11:12:08 -0800

MarkLogic had this as a feature early on, E Nodes (execute) and D nodes (data). 
I don’t remember anybody using it. It was probably a special for some customer. 
Once it was built, it wasn’t a big deal to maintain, but it was extra code that 
wasn’t adding much value.


wunder
Walter Underwood
[email protected]
http://observer.wunderwood.org/  (my blog)

> On Nov 29, 2025, at 9:34 AM, Ilan Ginzburg <[email protected]> wrote:
> 
> The only code drop was the initial branch
> https://github.com/apache/solr/tree/jira/solr-17125-zero-replicas
> That branch is a cleaned up version (and a better one really) of the
> production code Salesforce was running back then.
> Changes done since were not ported.
> 
> Any Solr node being able to get the latest copy of a shard allows no longer
> opening nor discovering all cores on a node but discovering and opening
> them lazily when needed (our clusters now scale to 100 000+ collections),
> no longer doing shard leader elections and instead doing a best effort to
> index on the same replica, limiting the number of open cores by using
> transient cores in SolrCloud mode etc.
> 
> A clear benefit of such a separation of compute and storage is when there's
> a high number of indexes, with only a small subset active at any given
> time. This meshes well with hosting scenarios with a lot of customers but
> few active at any given time.
> When all indexes are active, they have to be loaded on nodes anyway.
> 
> Ilan
> 
> On Sat, Nov 29, 2025 at 12:52 AM Matt Kuiper <[email protected]> wrote:
> 
>> Thanks for your reply. What you say makes sense.
>> 
>> Is there perhaps a fork of the Solr baseline with your changes available
>> for others to use?
>> 
>> Your solution is very compelling!
>> 
>> Matt
>> 
>> On Thu, Nov 27, 2025 at 3:39 AM Ilan Ginzburg <[email protected]> wrote:
>> 
>>> I don't believe there will be future work on this topic in the context of
>>> the Solr project.
>>> 
>>> With the experience of running in production at high scale for a few
>> years
>>> now a modified Solr with separation of compute and storage, the changes
>> (to
>>> the Cloud part of Solr, but there's unfortunately no real separation
>>> between single node Solr and SolrCloud code) are too big to make this
>>> approach optional. Efficiently implementing such a separation requires it
>>> to be the only storage/persistence layer. It changes
>>> durability/availability and cluster management assumptions in fundamental
>>> ways.
>>> 
>>> Ilan
>>> 
>>> On Fri, Nov 21, 2025 at 9:37 PM mtn search <[email protected]> wrote:
>>> 
>>>> Hello,
>>>> 
>>>> I am curious if there is current/future worked planned for:
>>>> 
>>>> https://issues.apache.org/jira/browse/SOLR-17125
>>>> 
>>>> 
>>>> 
>>> 
>> https://cwiki.apache.org/confluence/display/SOLR/SIP-20%3A+Separation+of+Compute+and+Storage+in+SolrCloud
>>>> 
>>>> Thanks,
>>>> Matt
>>>> 
>>> 
>>

Re: Separation of Compute and Storage in SolrCloud

Reply via email to