Re: Solr Cloud + zookeeper+ kubernetes setup

2023-11-23 Thread ufuk yılmaz
There’s a fundamental downside of embedded zk or putting zk on the same node as 
Solr. When there are many queries running, a Solr replica gets overloaded and 
it starts to affect the node itself (sometimes you can’t even ssh into the node 
when Solr is too busy), it also affects the Zk on the same node, which can 
cause a chain of downed solr replicas and zk nodes, reducing the overall 
availability of the cluster. 

When you need a high read high write cluster, it’s better to put zookeeper on 
its own machines I think. 

-ufuk yilmaz 

> On 23 Nov 2023, at 00:25, Jan Høydahl  wrote:
> 
> I think there is a product that speaks ZK api but stores everything in etcd :)
> 
> Also, you may want to check out 
> https://cwiki.apache.org/confluence/display/SOLR/SIP-18%3A+A+Solr+Kubernetes+Module+for+native+integration
>  , an initiative that will let you use ConfigMaps for configSet files etc. 
> I.e. somewhat more cloud native. And 
> https://cwiki.apache.org/confluence/display/SOLR/SIP-14+Embedded+Zookeeper, 
> if it gains traction, will embed ZK in Solr processes, making it an 
> implementation detail that the users dont need to manage separately. 
> 
> Jan
> 
>> 22. nov. 2023 kl. 14:30 skrev matthew sporleder :
>> 
>> It's annoying that zookeeper is a hard dependency when etcd is just
>> sitting there and all of the stuff in zk could easily fit into
>> ConfigMaps.
> 



Re: Solr Cloud + zookeeper+ kubernetes setup

2023-11-23 Thread Radu Gheorghe
Hi Vince,

Here's a tutorial for installing an external ZK + Solr on k8s:
https://sematext.com/blog/solr-operator-autoscaling-tutorial/

Hope it helps.

Radu
--
Elasticsearch/OpenSearch & Solr Consulting, Production Support & Training
Sematext Cloud - Full Stack Observability
https://sematext.com/ 


On Wed, Nov 22, 2023 at 6:02 PM Vince McMahon 
wrote:

> Ramesh,
>
> Can you elaborate more on the installation of ext. Zookeeper and how to
> make it work with solr on k8?
>
> On Wed, Nov 22, 2023, 6:26 AM Ramesh Balasubaramaniam 
> wrote:
>
> > Yes, Possible. How many shards and replicas you are planning to create
> for
> > the 200 millions documents all based upon the index size ( how many
> stored
> > fields ) and all.
> > External zookeeper is best. SOLR in kubernetes setup also works well.
> >
> >
> > Thanks,
> > Ramesh
> >
> > On Wed, Nov 22, 2023 at 3:30 AM Vince McMahon <
> > sippingonesandze...@gmail.com>
> > wrote:
> >
> > > Hi,
> > >
> > > I'd like to thank many of you giving your valuable advice.  You are
> > liked a
> > > big brother I never have.
> > > I wish you in the US a Happy Thanksgiving.
> > >
> > > I have boss that ask for the sky and I need a clarity if this is at
> > > all possible.  I have very large cores, 200 million docs.  The
> situation
> > > calls for Solr Cloud for Sharing and zookeeper for high avability.
> These
> > > needs to be running in kubernete environment.  Is it doable?  The
> > tutorial
> > > mentioned embedded zookeeper vs. external.  Which should be used for
> this
> > > situation?  Is there a how-to on building this setup?
> > >
> > > Thanks!
> > >
> >
> >
> > --
> > Thanks,
> > Ramesh
> >
>


Re: Solr Cloud + zookeeper+ kubernetes setup

2023-11-23 Thread Jan Høydahl
**NOW** In a k8s setting, ZK will have separate PODs (with both the suggested 
helm charts), and a POD hooks into the linux kernel to guarantee some resources 
so it is not starvated by hot Solr PODs on the same k8s node. So no need for 
separate machine, separate POD is ok. 

**FUTURE, NOT EXISTING**: With SIP-14 Solr may get a new "zookeeper" node-role 
that lets you spin up a solr Pod just for hosting an embedded ZK.

> 23. nov. 2023 kl. 09:02 skrev ufuk yılmaz :
> 
> There’s a fundamental downside of embedded zk or putting zk on the same node 
> as Solr. When there are many queries running, a Solr replica gets overloaded 
> and it starts to affect the node itself (sometimes you can’t even ssh into 
> the node when Solr is too busy), it also affects the Zk on the same node, 
> which can cause a chain of downed solr replicas and zk nodes, reducing the 
> overall availability of the cluster. 
> 
> When you need a high read high write cluster, it’s better to put zookeeper on 
> its own machines I think. 
> 
> -ufuk yilmaz 
> 
>> On 23 Nov 2023, at 00:25, Jan Høydahl  wrote:
>> 
>> I think there is a product that speaks ZK api but stores everything in etcd 
>> :)
>> 
>> Also, you may want to check out 
>> https://cwiki.apache.org/confluence/display/SOLR/SIP-18%3A+A+Solr+Kubernetes+Module+for+native+integration
>>  , an initiative that will let you use ConfigMaps for configSet files etc. 
>> I.e. somewhat more cloud native. And 
>> https://cwiki.apache.org/confluence/display/SOLR/SIP-14+Embedded+Zookeeper, 
>> if it gains traction, will embed ZK in Solr processes, making it an 
>> implementation detail that the users dont need to manage separately. 
>> 
>> Jan
>> 
>>> 22. nov. 2023 kl. 14:30 skrev matthew sporleder :
>>> 
>>> It's annoying that zookeeper is a hard dependency when etcd is just
>>> sitting there and all of the stuff in zk could easily fit into
>>> ConfigMaps.
>> 
> 



Re: Receiving 405 error messages for alias deletion

2023-11-23 Thread Hakan Özler
Thank you for the prompt response guys!

Apart from this case, I've encountered other problems that are peculiar to
the v2 API in 9.4.

1. Creating a backup [1] request using the v2 api directives sporadically
returns `An exception was not mapped due to exception mapper failure. The
HTTP 500 response will be returned` with the following stack trace.
However, the backup process is always successfully completed.

java.lang.NullPointerException: Cannot invoke
"org.apache.solr.request.SolrQueryRequest.getParams()" because
"solrQueryRequest" is null
at
org.apache.solr.handler.api.V2ApiUtils.getMediaTypeFromWtParam(V2ApiUtils.java:92)
at
org.apache.solr.jersey.MediaTypeOverridingFilter.filter(MediaTypeOverridingFilter.java:64)
at
org.glassfish.jersey.server.ContainerFilteringStage$ResponseFilterStage.apply(ContainerFilteringStage.java:172)
at
org.glassfish.jersey.server.ContainerFilteringStage$ResponseFilterStage.apply(ContainerFilteringStage.java:139)
at org.glassfish.jersey.process.internal.Stages.process(Stages.java:147)
at
org.glassfish.jersey.server.ServerRuntime$Responder.processResponse(ServerRuntime.java:378)

2. Creating a replica [2] obviously has the same behavior. We're able to
create a replica but we receive the message above at times.

1.
https://solr.apache.org/guide/solr/latest/deployment-guide/collection-management.html#backup
2.
https://solr.apache.org/guide/solr/latest/deployment-guide/replica-management.html#addreplica


On Mon, 20 Nov 2023 at 17:39, Jason Gerlowski  wrote:

> Good catch!  This is definitely a bug that I introduced as a part of
> SOLR-16393 - sorry for the trouble.
>
> The problem, counter-intuitively, is this line. [1]. The annotations in
> this file are overriding the ones we need in 'DeleteAliasApi' (which is
> where the path and verb are specified).
>
> I'll get this fixed and backported for 9.4.1/9.5.  I imagine it's cold
> comfort, but I really appreciate you guys trying out the v2 APIs and
> catching bugs - sorry it ended up burning you guys this time.
>
> Best,
>
> Jason
>
> [1]
>
> https://github.com/apache/solr/blob/9463f666f40f1d72a0ac7b3c46c7f8048ef8eaa9/solr/core/src/java/org/apache/solr/handler/admin/api/DeleteAlias.java#L55
>
> On Mon, Nov 20, 2023 at 6:46 AM ufuk yılmaz 
> wrote:
>
> > +1 I also get the same error with a fresh solrcloud 9.4 installation.
> > Jetty configuration doesn’t seem to have anything to block that request.
> >
> > Any idea where the implementation for this API interface is? Is it
> > automatically generated?
> >
> >
> https://github.com/apache/solr/blob/main/solr/api/src/java/org/apache/solr/client/api/endpoint/DeleteAliasApi.java
> >
> > --ufuk yilmaz
> >
> > Sent from Mail for Windows
> >
> > From: Hakan Özler
> > Sent: Monday, November 20, 2023 1:15 PM
> > To: users@solr.apache.org
> > Subject: Receiving 405 error messages for alias deletion
> >
> > Hi there!
> >
> > While I was working on alias management in SOLR, I noticed that the V2 of
> > alias deletion [1] is not allowed in Solr 9.4, and that there seems to
> be a
> > missing endpoint causing 405s with every attempt using the following
> > request:
> >
> >  List aliases:
> > $ curl -i -L -X GET 'http://localhost:8983/api/aliases'
> >
> >
> {"responseHeader":{"status":0,"QTime":5},"aliases":{"my-alias":"my-cloud-collection"},"properties":{}}
> >
> > Delete V2 alias 'my-alias':
> > $ curl -i -L -X DELETE 'http://localhost:8983/api/aliases/my-alias'
> > HTTP/1.1 405 Method Not Allowed
> >
> > However, its V1 API works fine:
> > $ curl -i -L -X DELETE '
> >
> >
> http://localhost:8983/solr/admin/collections?action=DELETEALIAS&name=my-alias
> > '
> > {"responseHeader":{"status":0,"QTime":5}}
> >
> > If I'm not mistaken according to the docs, do you think this might be
> > related to these improvements: [2]?
> >
> > Thanks,
> > Hakan
> >
> > 1.
> >
> >
> https://solr.apache.org/guide/solr/latest/deployment-guide/alias-management.html#deletealias
> > 2. https://issues.apache.org/jira/browse/SOLR-16393
> >
> >
>


Pb between pod and replicas

2023-11-23 Thread L'HARHANT Mikael
Hello,

I have a problem with SolrCloud (8.11.1).
Normally, each K8S pod hosts a replica.
However, during installation (or reinstallation) it happens that a pod recovers 
2 replicas and that another pod does not recover any.
We fix this with the admin console.
However, when restarting the pod, we find ourselves in the same situation.
Would there be some sort of cache specifying the initial installation structure 
that would be repeated when the pods start?

thanks


Re: [TOKYO Lucene/Solr meetup] ~ How to Integrate Open Source Large Language with Apache Solr (Alessandro) ~

2023-11-23 Thread Alessandro Benedetti
Thanks, Moriyasu for the heads up!
I'll advertise it on my LinkedIn soon and on the Apache Solr Twitter
account!
Also contacted some of the Japanese people I know to get better engagement
:)

Cheers
--
*Alessandro Benedetti*
Director @ Sease Ltd.
*Apache Lucene/Solr Committer*
*Apache Solr PMC Member*

e-mail: a.benede...@sease.io


*Sease* - Information Retrieval Applied
Consulting | Training | Open Source

Website: Sease.io 
LinkedIn  | Twitter
 | Youtube
 | Github



On Wed, 22 Nov 2023 at 04:13, Moriyasu Kannami <
moriyasu.kann...@rondhuit.com> wrote:

> Hi, everyone!
>
> The Solr community in Tokyo(#SolrJP) will hold a meet-up on 1 December
> 2023 at 17:00 JST(1 December 2023 at 01:00 PDT) in Toranomon
> Minato-Ku Tokyo. Online streaming via Google Meet is also available.
>
> The topic of this presentation is "How to Integrate
> Open Source Large Language with Apache Solr".
> This presentation is in English for about 45min.
>
> Speaker : Alessandro Benedetti
>   Apache Lucene/Solr Committer
>   Apache Solr PMC Member
>
> If you wish to participate, please register using the following URL.
> The event page(in Japanese): https://solr.doorkeeper.jp/events/166257
>
> We look forward to your participation.
>
> We are currently looking for several speakers for Lightning Talks of
> this meetup who reside in the Tokyo vicinity and can attend the
> venue(Toranomon Minato-Ku) on the day of the event.
> Speakers are welcome to participate online(Google Meet) as well.
> In that case,the place of residence is not a concern.
> The theme can be anything related to the Apache Lucene/Solr,
> such as case studies, new features, and more.
> If you wish to participate as a LT speaker, please contact me
> at this email address.
>
> Thanks!
>
> moriyasu
>


Re: Solr Cloud + zookeeper+ kubernetes setup

2023-11-23 Thread Dmitri Maziuk

On 11/23/23 04:59, Jan Høydahl wrote:


**FUTURE, NOT EXISTING**: With SIP-14 Solr may get a new "zookeeper" node-role 
that lets you spin up a solr Pod just for hosting an embedded ZK.


FWIW I've always had better luck spinning up 2 separate jvms than 
running 2 things in one: IME linux has always been better at process 
scheduling than the jvm.


I'm not sure how dumping it all in one operator is an improvement, if 
you're still spinning them up in separate pods.


Dima



Re: Solr Cloud + zookeeper+ kubernetes setup

2023-11-23 Thread Jan Høydahl
In k8s it’s quite simple already. But if the SIP is done there will be similar 
operational benefits for manual installs as well. But I would not hold my 
breath, may never happen.

Jan Høydahl

> 23. nov. 2023 kl. 18:50 skrev Dmitri Maziuk :
> 
> On 11/23/23 04:59, Jan Høydahl wrote:
> 
>> **FUTURE, NOT EXISTING**: With SIP-14 Solr may get a new "zookeeper" 
>> node-role that lets you spin up a solr Pod just for hosting an embedded ZK.
> 
> FWIW I've always had better luck spinning up 2 separate jvms than running 2 
> things in one: IME linux has always been better at process scheduling than 
> the jvm.
> 
> I'm not sure how dumping it all in one operator is an improvement, if you're 
> still spinning them up in separate pods.
> 
> Dima
> 


RE: Pb between pod and replicas

2023-11-23 Thread ufuk yılmaz
Hi,

Are you using “rolling update” or “recreate” strategy for the deployment?

-ufuk yilmaz

Sent from Mail for Windows

From: L'HARHANT Mikael
Sent: Thursday, November 23, 2023 5:20 PM
To: users@solr.apache.org
Subject: Pb between pod and replicas

Hello,

I have a problem with SolrCloud (8.11.1).
Normally, each K8S pod hosts a replica.
However, during installation (or reinstallation) it happens that a pod recovers 
2 replicas and that another pod does not recover any.
We fix this with the admin console.
However, when restarting the pod, we find ourselves in the same situation.
Would there be some sort of cache specifying the initial installation structure 
that would be repeated when the pods start?

thanks



RE: Pb between pod and replicas

2023-11-23 Thread L'HARHANT Mikael
Hello,

I use "RollingUpdate".




Hi,

Are you using “rolling update” or “recreate” strategy for the deployment?

-ufuk yilmaz

Sent from Mail for Windows

From: L'HARHANT Mikael
Sent: Thursday, November 23, 2023 5:20 PM
To: users@solr.apache.org
Subject: Pb between pod and replicas

Hello,

I have a problem with SolrCloud (8.11.1).
Normally, each K8S pod hosts a replica.
However, during installation (or reinstallation) it happens that a pod recovers 
2 replicas and that another pod does not recover any.
We fix this with the admin console.
However, when restarting the pod, we find ourselves in the same situation.
Would there be some sort of cache specifying the initial installation structure 
that would be repeated when the pods start?

thanks