Hi all - this is an interesting thread.  I use S3 a lot and SOLR a lot and
I also love [the older versions of*] MinIO too and have it running on
dedicated storage servers with a load of SSDs but not for production use.

S3 is great for many things but, like HTTP requests, it's very laggy.

With SOLR I've found for performance I have to give the underlying servers
(Linux/Ubuntu EC2 instances) enough system RAM to cache the SOLR
collections - when I do this I see that (with iotop) my disk reads drop to
zero.

I tried lots of silly stuff in the past (SOLR nodes with ramdisks) but then
I found if you give Linux enough spare RAM it'll just cache the entire SOLR
data for you.

My workload is not massive I think (biggest collection is about 3 million
documents - the website is getting around 11 million page views a month and
some pages do 10 SOLR queries to render the page - all works very well)
although I do have a collection with about 3 millions documents that I do
KNN nearest neighbour searches on and I found that without completely
caching the collection in RAM it's too slow to ever return in time (for a
web site response).

So, from all that, I can't see how S3 object storage could ever be used for
anything other than backups (I would like to know how to schedule backups
to something like S3 - right now I manually pull documents to a big backup
file and over to ElasticSearch as a backup).

Derek

* I love the older version of MinIO because it doesn't store the files in a
weird way - this means I have just have an NGINX server looking at the
directory (the underlying MinIO bucket directory) and it just works.  It's
like the new MinIO stores the files in it's own format - maybe great for
S3-like things (IDK maybe versions?  I'm not sure) but it's not possible
just to look at the real files on disk or though a web server so I've stuck
with the older MinIO for my on-prem S3 R&D.

On Wed, Oct 16, 2024 at 7:25 PM Jan Høydahl <jan....@cominvent.com> wrote:

> Hi,
>
> I don't know much about it, but our backup S3 feature works with e.g.
> MinIO which is S3 compatible. And most often these are made pluggable as
> well. I encourage you to engage in the discussion on mailing list and/or
> JIRA with your input.
>
> Jan
>
> > 16. okt. 2024 kl. 19:56 skrev mtn search <search...@gmail.com>:
> >
> > Hi Jan,
> >
> > Do you know if there would be many hurdles to implement SOLR SIP-20 with
> > on-prem object storage rather than say S3 in the cloud?
> >
> > Matt
> >
> > On Tue, Oct 15, 2024 at 5:28 PM mtn search <search...@gmail.com> wrote:
> >
> >> Thanks Jan!
> >>
> >> On Tue, Oct 15, 2024 at 4:46 PM Jan Høydahl <jan....@cominvent.com>
> wrote:
> >>
> >>> Hi,
> >>>
> >>> That is correct. But plans are being made to add such support:
> >>>
> https://cwiki.apache.org/confluence/display/SOLR/SIP-20%3A+Separation+of+Compute+and+Storage+in+SolrCloud
> >>>
> >>> Jan
> >>>
> >>>> 15. okt. 2024 kl. 20:17 skrev mtn search <search...@gmail.com>:
> >>>>
> >>>> Hello,
> >>>>
> >>>> Just want to confirm that while Solr provides a backup/restore API to
> >>> S3 -
> >>>> object storage - there is no option to run Solr on object storage.  Is
> >>> this
> >>>> correct?
> >>>>
> >>>> Thanks,
> >>>> Matt
> >>>
> >>>
>
>

-- 
-- 
Derek Conniffe
Harvey Software Systems Ltd T/A HSSL
Telephone (IRL): 086 856 3823
Telephone (US): (650) 449 6044
Skype: dconnrt
Email: de...@hssl.ie


*Disclaimer:* This email and any files transmitted with it are confidential
and intended solely for the use of the individual or entity to whom they
are addressed. If you have received this email in error please delete it
(if you are not the intended recipient you are notified that disclosing,
copying, distributing or taking any action in reliance on the contents of
this information is strictly prohibited).
*Warning*: Although HSSL have taken reasonable precautions to ensure no
viruses are present in this email, HSSL cannot accept responsibility for
any loss or damage arising from the use of this email or attachments.
P For the Environment, please only print this email if necessary.

Reply via email to