Re: Separation of Compute and Storage in SolrCloud

Walter Underwood Sat, 29 Nov 2025 11:29:35 -0800

I ran some benchmarks years ago and Varnish was the only thing faster than 
native Solr caching.


If you have a big cluster, a single cache can have a much higher hit rate than 
a set of separate caches. I should do the probability math at some point, but 
with a single cache, the second request will be a cache it. It can take up to 
2N requests with N separate nodes.

At Netflix, the caching built into our Citrix load balancers would handle about 
80% of queries once they were warmed up. Saved us some serious money on search 
servers. I’m sure that search is different there now, this was with 15 m 
subscribers instead of the current 300+ m subs.

wunder
Walter Underwood
[email protected]
http://observer.wunderwood.org/  (my blog)

> On Nov 29, 2025, at 11:23 AM, matthew sporleder <[email protected]> wrote:
> 
> Can't you get something close-ish to this with solrcloud as-is?
> 
> Use a traffic routing layer with no replicas and then some fat data
> nodes with fast disks + well-tuned cache settings.
> 
> My thinking is the cache settings would do the magic of the hot/cold
> stuff being described and the routing layer could pretty much
> auto-scale (with its own query caching). The data layer could even be
> segmented with specific collections on different hardware.
> 
> Speaking of caching - we used to run varnish in front of solr with
> good effect as well.
> 
> On Sat, Nov 29, 2025 at 2:11 PM Walter Underwood <[email protected]> 
> wrote:
>> 
>> MarkLogic had this as a feature early on, E Nodes (execute) and D nodes 
>> (data). I don’t remember anybody using it. It was probably a special for 
>> some customer. Once it was built, it wasn’t a big deal to maintain, but it 
>> was extra code that wasn’t adding much value.
>> 
>> wunder
>> Walter Underwood
>> [email protected]
>> http://observer.wunderwood.org/  (my blog)
>> 
>>> On Nov 29, 2025, at 9:34 AM, Ilan Ginzburg <[email protected]> wrote:
>>> 
>>> The only code drop was the initial branch
>>> https://github.com/apache/solr/tree/jira/solr-17125-zero-replicas
>>> That branch is a cleaned up version (and a better one really) of the
>>> production code Salesforce was running back then.
>>> Changes done since were not ported.
>>> 
>>> Any Solr node being able to get the latest copy of a shard allows no longer
>>> opening nor discovering all cores on a node but discovering and opening
>>> them lazily when needed (our clusters now scale to 100 000+ collections),
>>> no longer doing shard leader elections and instead doing a best effort to
>>> index on the same replica, limiting the number of open cores by using
>>> transient cores in SolrCloud mode etc.
>>> 
>>> A clear benefit of such a separation of compute and storage is when there's
>>> a high number of indexes, with only a small subset active at any given
>>> time. This meshes well with hosting scenarios with a lot of customers but
>>> few active at any given time.
>>> When all indexes are active, they have to be loaded on nodes anyway.
>>> 
>>> Ilan
>>> 
>>> On Sat, Nov 29, 2025 at 12:52 AM Matt Kuiper <[email protected]> wrote:
>>> 
>>>> Thanks for your reply. What you say makes sense.
>>>> 
>>>> Is there perhaps a fork of the Solr baseline with your changes available
>>>> for others to use?
>>>> 
>>>> Your solution is very compelling!
>>>> 
>>>> Matt
>>>> 
>>>> On Thu, Nov 27, 2025 at 3:39 AM Ilan Ginzburg <[email protected]> wrote:
>>>> 
>>>>> I don't believe there will be future work on this topic in the context of
>>>>> the Solr project.
>>>>> 
>>>>> With the experience of running in production at high scale for a few
>>>> years
>>>>> now a modified Solr with separation of compute and storage, the changes
>>>> (to
>>>>> the Cloud part of Solr, but there's unfortunately no real separation
>>>>> between single node Solr and SolrCloud code) are too big to make this
>>>>> approach optional. Efficiently implementing such a separation requires it
>>>>> to be the only storage/persistence layer. It changes
>>>>> durability/availability and cluster management assumptions in fundamental
>>>>> ways.
>>>>> 
>>>>> Ilan
>>>>> 
>>>>> On Fri, Nov 21, 2025 at 9:37 PM mtn search <[email protected]> wrote:
>>>>> 
>>>>>> Hello,
>>>>>> 
>>>>>> I am curious if there is current/future worked planned for:
>>>>>> 
>>>>>> https://issues.apache.org/jira/browse/SOLR-17125
>>>>>> 
>>>>>> 
>>>>>> 
>>>>> 
>>>> https://cwiki.apache.org/confluence/display/SOLR/SIP-20%3A+Separation+of+Compute+and+Storage+in+SolrCloud
>>>>>> 
>>>>>> Thanks,
>>>>>> Matt
>>>>>> 
>>>>> 
>>>> 
>>

Re: Separation of Compute and Storage in SolrCloud

Reply via email to