Derek, Please read the text of SIP-20. it will store the primary copy of a shard in object store but always cache onto local disk on each node, which again is cached in virtual memory. So speed is not limited by S3, indexing latency may be. And you can scale to zero without data loss.
Jan Høydahl > On 16 Oct 2024, at 21:15, Derek C <de...@hssl.ie> wrote: > > Hi all - this is an interesting thread. I use S3 a lot and SOLR a lot and > I also love [the older versions of*] MinIO too and have it running on > dedicated storage servers with a load of SSDs but not for production use. > > S3 is great for many things but, like HTTP requests, it's very laggy. > > With SOLR I've found for performance I have to give the underlying servers > (Linux/Ubuntu EC2 instances) enough system RAM to cache the SOLR > collections - when I do this I see that (with iotop) my disk reads drop to > zero. > > I tried lots of silly stuff in the past (SOLR nodes with ramdisks) but then > I found if you give Linux enough spare RAM it'll just cache the entire SOLR > data for you. > > My workload is not massive I think (biggest collection is about 3 million > documents - the website is getting around 11 million page views a month and > some pages do 10 SOLR queries to render the page - all works very well) > although I do have a collection with about 3 millions documents that I do > KNN nearest neighbour searches on and I found that without completely > caching the collection in RAM it's too slow to ever return in time (for a > web site response). > > So, from all that, I can't see how S3 object storage could ever be used for > anything other than backups (I would like to know how to schedule backups > to something like S3 - right now I manually pull documents to a big backup > file and over to ElasticSearch as a backup). > > Derek > > * I love the older version of MinIO because it doesn't store the files in a > weird way - this means I have just have an NGINX server looking at the > directory (the underlying MinIO bucket directory) and it just works. It's > like the new MinIO stores the files in it's own format - maybe great for > S3-like things (IDK maybe versions? I'm not sure) but it's not possible > just to look at the real files on disk or though a web server so I've stuck > with the older MinIO for my on-prem S3 R&D. > >> On Wed, Oct 16, 2024 at 7:25 PM Jan Høydahl <jan....@cominvent.com> wrote: >> >> Hi, >> >> I don't know much about it, but our backup S3 feature works with e.g. >> MinIO which is S3 compatible. And most often these are made pluggable as >> well. I encourage you to engage in the discussion on mailing list and/or >> JIRA with your input. >> >> Jan >> >>>> 16. okt. 2024 kl. 19:56 skrev mtn search <search...@gmail.com>: >>> >>> Hi Jan, >>> >>> Do you know if there would be many hurdles to implement SOLR SIP-20 with >>> on-prem object storage rather than say S3 in the cloud? >>> >>> Matt >>> >>>> On Tue, Oct 15, 2024 at 5:28 PM mtn search <search...@gmail.com> wrote: >>> >>>> Thanks Jan! >>>> >>>> On Tue, Oct 15, 2024 at 4:46 PM Jan Høydahl <jan....@cominvent.com> >> wrote: >>>> >>>>> Hi, >>>>> >>>>> That is correct. But plans are being made to add such support: >>>>> >> https://cwiki.apache.org/confluence/display/SOLR/SIP-20%3A+Separation+of+Compute+and+Storage+in+SolrCloud >>>>> >>>>> Jan >>>>> >>>>>> 15. okt. 2024 kl. 20:17 skrev mtn search <search...@gmail.com>: >>>>>> >>>>>> Hello, >>>>>> >>>>>> Just want to confirm that while Solr provides a backup/restore API to >>>>> S3 - >>>>>> object storage - there is no option to run Solr on object storage. Is >>>>> this >>>>>> correct? >>>>>> >>>>>> Thanks, >>>>>> Matt >>>>> >>>>> >> >> > > -- > -- > Derek Conniffe > Harvey Software Systems Ltd T/A HSSL > Telephone (IRL): 086 856 3823 > Telephone (US): (650) 449 6044 > Skype: dconnrt > Email: de...@hssl.ie > > > *Disclaimer:* This email and any files transmitted with it are confidential > and intended solely for the use of the individual or entity to whom they > are addressed. If you have received this email in error please delete it > (if you are not the intended recipient you are notified that disclosing, > copying, distributing or taking any action in reliance on the contents of > this information is strictly prohibited). > *Warning*: Although HSSL have taken reasonable precautions to ensure no > viruses are present in this email, HSSL cannot accept responsibility for > any loss or damage arising from the use of this email or attachments. > P For the Environment, please only print this email if necessary.