Re: Datacenter/Region replication

2022-08-05 Thread Jan Høydahl
A new CDCR architecture is discussed in SIP-13 
https://cwiki.apache.org/confluence/display/SOLR/SIP-13:+Cross+Data+Center+Replication
 and being worked on in the sandbox repo 
https://github.com/apache/solr-sandbox/tree/crossdc-wip

You should check with Anshum and Mark for details.

Jan

> 1. jul. 2022 kl. 09:06 skrev Eug ene :
> 
> Howdy!
> 
> Is there an alternative to CDCR that doesn't require changes to
> application code?
> 
> I'd like to set up replication between regions (currently running
> 8.x). It will be near real time since the data change is megabytes per
> day, and apparently CDCR was deprecated in 8.6 (and dropped in v9) and
> is generally untrustworthy.
> 
> Would it be dangerous to just create a cluster that spans DCs with
> multiple replicas in each DC?
> 
> I've seen people mentioning using Kafka to assist with this process,
> but I can't find any information or examples in the wild for this.
> 
> Thanks in advance for any advice!
> 
> -Eugene



Re: Autoscaling

2022-08-05 Thread Jan Høydahl
Hi,

With mult tenants, scaling on the #tenants axis will be simply adding new 
collections to the cluster. That should be fairly simple with K8S and 
SolrOperator. First add N new nodes to your EKS cluster, then use --scale in 
your SolrOperator to add more PODs, which will then pop up as "empty" Solr 
nodes in the cluster. Finally, create the new collection(s) with desired number 
of shards/replicas, and let the new PlacementPlugins introduced in Solr 9 
(https://solr.apache.org/guide/solr/latest/configuration-guide/replica-placement-plugins.html)
 take care of placing the new collection on the best PODs (typically the new 
empty ones).

Should a tenant start to see slowness due to too many docs per shard, you could 
then either migrate that collection to a new one with more shards, or build 
into your app's control plane a feature which would perform SPLITSHARD 

 + MOVEREPLICA 

 on that collection. Looks like MOVEREPLICA does not support automatically 
picking targetNode using placement logic, which would have made the operation 
much simpler.

Jan

> 18. jul. 2022 kl. 09:00 skrev Kaminski, Adi :
> 
> Shawn - thanks for your response !
> 
> 1M index was just an example. For instance, we are planning to have multiple 
> customers on same SolrCloud cluster (each customer/tenant=collection) . Some 
> customers may have 1-2M docs (small ones),
> some will have 3-12M docs (medium ones) and some will have 20-80M docs (large 
> ones). If we migrate 100 such customers of different sizes, eventually we 
> will end up with 1B+ docs in same SolrCloud cluster (depends on ratio of 
> large vs medium vs small ones of course(.
> 
> The thing is that we cannot project the growth of each customer (Solr 
> collection) other than relying on size/quota that the customer has with 
> on-prem deployment before we migrate to cloud.
> And also, would like to prevent static tuning (#shards) and then manual 
> operations management  (such as splits, rebalancing if supported, etc.) based 
> on some rules/etc.
> 
> That's why we are asking whether some automatic capabilities exist in Solr to 
> ease the maintenance work and simplify the tuning (we understand that some 
> exist in Solr 8.11 but planned to be deprecated starting Solr 9.x)
> 
> Alternatively, if there are some other best practices to meet our use case, 
> we'll be happy to hear some direction.
> 
> Thanks in advance,
> Adi
> 
> -Original Message-
> From: Shawn Heisey 
> Sent: Monday, July 18, 2022 12:42 AM
> To: users@solr.apache.org
> Subject: Re: Autoscaling
> 
> On 7/17/22 11:25, Kaminski, Adi wrote:
>> For example, if we have 10 shards each 100k (1M total) documents size for 
>> best and optimized ingestion/query performance...adding more documents will 
>> make sense to have 11th shard, and reaching 1.1M total will make sense to 
>> add 12th one eventually.
> 
> One million total documents is actually a pretty small index, and as you were 
> told in another reply, is not big enough in most situations to require 
> sharding, unless your hardware has very little cpu/memory/storage.
> 
>> Is it reasonable to use some automation of collections API, splitting shards 
>> accordingly to some strategy (largest, oldest, etc.) ?
> 
> In a typical scenario, every shard will be approximately equal in size, and 
> will contain documents of any age.  If you have a 10 shard index and you 
> split one of the shards, then you will have 9 shards of relatively equal size 
> and two shards that are each half the size of the other 9. To correctly 
> redistribute the load, you would need to split ALL the shards, so you would 
> end up with 20 shards, or some other multiple of 10, the starting point.
> 
> In my last reply, I mentioned the implicit router.  This is the router you 
> would need to use if you want to organize your shards by something like date. 
>  But then every single document you index must indicate what shard it will 
> end up on -- there is no automatic routing.
> 
>> Aren't some out of the box capabilities in Solr Cloud search engine ? Or 
>> maybe some libraries/operators on top to simplify k8s deployments, but not 
>> only for queries and automatic PODs scaling but also automating data storage 
>> optimization (per volume, date, any other custom logic..).
> 
> I have no idea what you are asking here.
> 
> Thanks,
> Shawn
> 
> 
> 
> This electronic message may contain proprietary and confidential information 
> of Verint Systems Inc., its affiliates and/or subsidiaries. The information 
> is intended to be for the use of the individual(s) or entity(ies) named 
> above. If you are not the intended recipient (or authorized to receive this 
> e-mail for the intended recipient), you may not use, copy, disclose or 
> distribute to anyone this message or any information conta

Re: Solr 8.10.0+ cuts away nginx-ingress basic authentication header

2022-08-05 Thread Jan Høydahl
Hi,

The Admin UI will send Ajax requests to Solr and to do that it needs the basic 
auth credentials.
With Solr's built-in Auth, Admin UI will remember the creds and explicitly pass 
them on every Ajax request.
But for 3rd party auth in nginx, Admin UI will be able to do that.
Perhaps look for a Browser Plugin where you can enter username + password and 
let the browser do the auth for you?

Jan

> 26. jul. 2022 kl. 21:17 skrev save-cho...@hotmail.com:
> 
> Hello guys,
> 
> I'm using a solrcloud cluster setup in kubernetes 
> (not using the solr operator), so far I'm using an nginx-ingress with
> basic auth to secure my solr instance. This has been working fine until
> I've upgraded from version 8.9.0 to 8.10.0/8.11.2. 
> Now after updating to the mentioned versions whenever I open 
> the Solr Admin UI via my ingress I get a basic authentication 
> popup in my browser for each single request I make against the solr,
> so every single javascript that's called requires an additional basic 
> authentication. This makes the UI pretty much unusable. Does anyone 
> know if there is a workaround for this problem that doesn't involve
> using the solr built-in authentication method?
> 
> The built-in solr authentication has never been enabled so I assume 
> it is disabled by default. 
> 
> Thanks.



Re: Solr dynamic reconfiguration of zookeeper ensemble

2022-08-05 Thread Jan Høydahl
Hi,

This will not cause any issues. You should however configure all three ZK hosts 
in your ZK_HOST setting for Solr.

Beware that Solr's connecton to Zookeeper does NOT support dynamic 
configuration as provided by Zookeeper, i.e. if you e.g. want to resize your ZK 
cluster you will still have to edit ZK_HOST and restart your Solr cluster, Solr 
will not pick up the changes from dynamic reconfig.

As for your "SockerException", it is also harmless as it is only affecting the 
UI, but it will be solved by upgrading to Solr 8.11.2, see 
https://issues.apache.org/jira/browse/SOLR-15849

Jan

> 11. jul. 2022 kl. 16:58 skrev HariBabu kuruva :
> 
> Hi All,
> 
> We are using solr 8.11.1 with zk 3.2. Recently we have upgraded zk to 3.7.1.
> 
> We observe below warning on solr-ui .
> 
> "Solr ZK connection string (2 hosts) is different from the dynamic ensemble
> config (3 hosts)."
> 
> Please let us know will it cause any issue and
> Is there any option to make solr support the zk's dynamic reconfiguration .
> 
> Thanks in advance.



solr backup location 8.11.1

2022-08-05 Thread Thomas Woodard
I need to backup to a network file system to support recovery. I do not
want the index on a network file system, so just mounting /var/solr/data
isn't an option. I have attempted to set the location in the replication
handler, but it is not working. I've tried all of these configurations.

  

  optimize
  optimize

2
00:00:20

  /var/i8s/backup/solr/${i8s.environment}/${
solr.core.name}

  

  

  optimize
  optimize
  /var/i8s/backup/solr/${i8s.environment}/${
solr.core.name}

2
00:00:20
  

  

  optimize
  optimize

2
00:00:20
/var/i8s/backup/solr/${i8s.environment}/${
solr.core.name}
  

The backups after optimize are happening, but they are going to the default
locations, not the configured location. For example:
2022-08-04 17:19:52.053 INFO  (Thread-14) [   ] o.a.s.h.SnapShooter
Creating backup snapshot  at
file:///var/solr/data/contentPage/data/

I've confirmed that it isn't a path security issue, by verifying that all
paths are allowed:
2022-08-05 12:29:03.873 INFO  (main) [   ] o.a.s.c.CoreContainer Allowing
use of paths: [_ALL_]

How do I make backups go where I want?


Re: Solr dynamic reconfiguration of zookeeper ensemble

2022-08-05 Thread HariBabu kuruva
Thank you very much for the reply.

On Fri, Aug 5, 2022 at 5:43 PM Jan Høydahl  wrote:

> Hi,
>
> This will not cause any issues. You should however configure all three ZK
> hosts in your ZK_HOST setting for Solr.
>
> Beware that Solr's connecton to Zookeeper does NOT support dynamic
> configuration as provided by Zookeeper, i.e. if you e.g. want to resize
> your ZK cluster you will still have to edit ZK_HOST and restart your Solr
> cluster, Solr will not pick up the changes from dynamic reconfig.
>
> As for your "SockerException", it is also harmless as it is only affecting
> the UI, but it will be solved by upgrading to Solr 8.11.2, see
> https://issues.apache.org/jira/browse/SOLR-15849
>
> Jan
>
> > 11. jul. 2022 kl. 16:58 skrev HariBabu kuruva  >:
> >
> > Hi All,
> >
> > We are using solr 8.11.1 with zk 3.2. Recently we have upgraded zk to
> 3.7.1.
> >
> > We observe below warning on solr-ui .
> >
> > "Solr ZK connection string (2 hosts) is different from the dynamic
> ensemble
> > config (3 hosts)."
> >
> > Please let us know will it cause any issue and
> > Is there any option to make solr support the zk's dynamic
> reconfiguration .
> >
> > Thanks in advance.
>
>

-- 

Thanks and Regards,
 Hari
Mobile:9790756568


Re: solr backup location 8.11.1

2022-08-05 Thread Shawn Heisey

On 8/5/22 07:00, Thomas Woodard wrote:

   
 
   optimize
   optimize
 
 2
 00:00:20
 /var/i8s/backup/solr/${i8s.environment}/${
solr.core.name}
   

The backups after optimize are happening, but they are going to the default
locations, not the configured location. For example:
2022-08-04 17:19:52.053 INFO  (Thread-14) [   ] o.a.s.h.SnapShooter
Creating backup snapshot  at
file:///var/solr/data/contentPage/data/

I've confirmed that it isn't a path security issue, by verifying that all
paths are allowed:
2022-08-05 12:29:03.873 INFO  (main) [   ] o.a.s.c.CoreContainer Allowing
use of paths: [_ALL_]


https://solr.apache.org/guide/8_11/index-replication.html#http-api-commands-for-the-replicationhandler

This appears to be some relevant info on that page:

 *

   |location|: Backup location. Value depends on the repository in use.
   For file system repository, location defaults to core’s dataDir, and
   if specified, it needs to be within|SOLR_HOME|,|SOLR_DATA_HOME|or
   the paths specified by solr.xml|allowPaths|.


I am not sure that you can put "location" in solrconfig.xml ... the 
reference guide lists it as a URL parameter, not a configuration 
parameter.  I have not verified this.


Once you work out whether it needs to be a URL parameter: For security 
purposes, Solr limits where it can write data that is triggered by API 
calls.  If you want it to be outside of SOLR_HOME or SOLR_DATA_HOME then 
you have to allow the path in solr.xml.


https://solr.apache.org/guide/8_11/format-of-solr-xml.html#the-solr-element

Thanks,
Shawn



Re: solr backup location 8.11.1

2022-08-05 Thread Shawn Heisey

On 8/5/22 07:42, Shawn Heisey wrote:

I've confirmed that it isn't a path security issue, by verifying that all
paths are allowed:
2022-08-05 12:29:03.873 INFO  (main) [   ] o.a.s.c.CoreContainer Allowing
use of paths: [_ALL_]


I missed this part of your email until after I had already sent my other 
reply.  Apologies for the oversight.


I think the problem is likely that location must be a URL parameter, not 
configured in solrconfig.xml.  The code looks like it supports this 
conclusion.


Thanks,
Shawn



Re: solr backup location 8.11.1

2022-08-05 Thread Thomas Woodard
That is exactly what I was afraid of. Not being able to configure where
automated backups go seems like a pretty major oversight, though. Is anyone
aware of a solution other than creating a bunch of soft links?

On Fri, Aug 5, 2022 at 8:52 AM Shawn Heisey  wrote:

> On 8/5/22 07:42, Shawn Heisey wrote:
> > I've confirmed that it isn't a path security issue, by verifying that all
> > paths are allowed:
> > 2022-08-05 12:29:03.873 INFO  (main) [   ] o.a.s.c.CoreContainer Allowing
> > use of paths: [_ALL_]
>
> I missed this part of your email until after I had already sent my other
> reply.  Apologies for the oversight.
>
> I think the problem is likely that location must be a URL parameter, not
> configured in solrconfig.xml.  The code looks like it supports this
> conclusion.
>
> Thanks,
> Shawn
>
>


Re: solr backup location 8.11.1

2022-08-05 Thread Dave
Can’t you just make a cron job that runs an sh file that does a cp-rf on the 
data folder with a time stamp?  The indexes are drop in when needed

> On Aug 5, 2022, at 12:07 PM, Thomas Woodard  wrote:
> 
> That is exactly what I was afraid of. Not being able to configure where
> automated backups go seems like a pretty major oversight, though. Is anyone
> aware of a solution other than creating a bunch of soft links?
> 
>> On Fri, Aug 5, 2022 at 8:52 AM Shawn Heisey  wrote:
>> 
>>> On 8/5/22 07:42, Shawn Heisey wrote:
>>> I've confirmed that it isn't a path security issue, by verifying that all
>>> paths are allowed:
>>> 2022-08-05 12:29:03.873 INFO  (main) [   ] o.a.s.c.CoreContainer Allowing
>>> use of paths: [_ALL_]
>> 
>> I missed this part of your email until after I had already sent my other
>> reply.  Apologies for the oversight.
>> 
>> I think the problem is likely that location must be a URL parameter, not
>> configured in solrconfig.xml.  The code looks like it supports this
>> conclusion.
>> 
>> Thanks,
>> Shawn
>> 
>> 


Re: solr backup location 8.11.1

2022-08-05 Thread Thomas Woodard
Actually, soft links won't work either, because the snapshots aren't in a
subdirectory of data, and each one has a different name.

Cron on ec2 is a bit of a pain, but yes, that does seem like the
best solution available.

On Fri, Aug 5, 2022 at 11:15 AM Dave  wrote:

> Can’t you just make a cron job that runs an sh file that does a cp-rf on
> the data folder with a time stamp?  The indexes are drop in when needed
>
> > On Aug 5, 2022, at 12:07 PM, Thomas Woodard  wrote:
> >
> > That is exactly what I was afraid of. Not being able to configure where
> > automated backups go seems like a pretty major oversight, though. Is
> anyone
> > aware of a solution other than creating a bunch of soft links?
> >
> >> On Fri, Aug 5, 2022 at 8:52 AM Shawn Heisey 
> wrote:
> >>
> >>> On 8/5/22 07:42, Shawn Heisey wrote:
> >>> I've confirmed that it isn't a path security issue, by verifying that
> all
> >>> paths are allowed:
> >>> 2022-08-05 12:29:03.873 INFO  (main) [   ] o.a.s.c.CoreContainer
> Allowing
> >>> use of paths: [_ALL_]
> >>
> >> I missed this part of your email until after I had already sent my other
> >> reply.  Apologies for the oversight.
> >>
> >> I think the problem is likely that location must be a URL parameter, not
> >> configured in solrconfig.xml.  The code looks like it supports this
> >> conclusion.
> >>
> >> Thanks,
> >> Shawn
> >>
> >>
>


Re: solr backup location 8.11.1

2022-08-05 Thread Shawn Heisey

On 8/5/22 10:06, Thomas Woodard wrote:

That is exactly what I was afraid of. Not being able to configure where
automated backups go seems like a pretty major oversight, though. Is anyone
aware of a solution other than creating a bunch of soft links?


The symlink idea I had (but haven't mentioned) would work pretty well if 
you were calling http://server:port/solr/CORE/replication with a script 
or manually, but not for the triggered backups. Maybe in the meantime 
you can switch to a scheduled script and provide location and name 
params on the URL instead of configuring backupAfter.  Then you could do 
anything you want to do and won't have to compile it yourself or wait 
for a new version.


FYI, if your index is not very small, you should probably not be 
optimizing it frequently.  If the optimizes are not frequent, or an 
optimize completes very quickly, then ignore that.


Please open an enhancement issue in the Apache Jira on the SOLR 
project.  You are right that the location should be configurable as well 
as something that can be provided on the URL.  I think we need to take a 
close look at all the  parameters for the replication handler and decide 
which ones should be configurable in solrconfig.xml.


When I have some free time I will look into improving the handler.  An 
issue in Jira makes that work easier to track, and would also get your 
name in the changelog.


Thanks,
Shawn



Re: solr backup location 8.11.1

2022-08-05 Thread Dave
If you have any metal, a cron doing an rsync against ec2 may work well, hell 
you could do that with a cheap laptop that has a large hard drive running linux 
that is plugged in and doesn’t sleep. Enterprise? No. Works? Certainly 

> On Aug 5, 2022, at 12:31 PM, Thomas Woodard  wrote:
> 
> Actually, soft links won't work either, because the snapshots aren't in a
> subdirectory of data, and each one has a different name.
> 
> Cron on ec2 is a bit of a pain, but yes, that does seem like the
> best solution available.
> 
>> On Fri, Aug 5, 2022 at 11:15 AM Dave  wrote:
>> 
>> Can’t you just make a cron job that runs an sh file that does a cp-rf on
>> the data folder with a time stamp?  The indexes are drop in when needed
>> 
 On Aug 5, 2022, at 12:07 PM, Thomas Woodard  wrote:
>>> 
>>> That is exactly what I was afraid of. Not being able to configure where
>>> automated backups go seems like a pretty major oversight, though. Is
>> anyone
>>> aware of a solution other than creating a bunch of soft links?
>>> 
 On Fri, Aug 5, 2022 at 8:52 AM Shawn Heisey 
>> wrote:
 
> On 8/5/22 07:42, Shawn Heisey wrote:
> I've confirmed that it isn't a path security issue, by verifying that
>> all
> paths are allowed:
> 2022-08-05 12:29:03.873 INFO  (main) [   ] o.a.s.c.CoreContainer
>> Allowing
> use of paths: [_ALL_]
 
 I missed this part of your email until after I had already sent my other
 reply.  Apologies for the oversight.
 
 I think the problem is likely that location must be a URL parameter, not
 configured in solrconfig.xml.  The code looks like it supports this
 conclusion.
 
 Thanks,
 Shawn
 
 
>> 


Re: solr backup location 8.11.1

2022-08-05 Thread Thomas Woodard
Thanks for the rapid replies. I've opened
https://issues.apache.org/jira/browse/SOLR-16326 and will proceed with
scripting a scheduled backup instead.

On Fri, Aug 5, 2022 at 11:36 AM Shawn Heisey  wrote:

> On 8/5/22 10:06, Thomas Woodard wrote:
> > That is exactly what I was afraid of. Not being able to configure where
> > automated backups go seems like a pretty major oversight, though. Is
> anyone
> > aware of a solution other than creating a bunch of soft links?
>
> The symlink idea I had (but haven't mentioned) would work pretty well if
> you were calling http://server:port/solr/CORE/replication with a script
> or manually, but not for the triggered backups. Maybe in the meantime
> you can switch to a scheduled script and provide location and name
> params on the URL instead of configuring backupAfter.  Then you could do
> anything you want to do and won't have to compile it yourself or wait
> for a new version.
>
> FYI, if your index is not very small, you should probably not be
> optimizing it frequently.  If the optimizes are not frequent, or an
> optimize completes very quickly, then ignore that.
>
> Please open an enhancement issue in the Apache Jira on the SOLR
> project.  You are right that the location should be configurable as well
> as something that can be provided on the URL.  I think we need to take a
> close look at all the  parameters for the replication handler and decide
> which ones should be configurable in solrconfig.xml.
>
> When I have some free time I will look into improving the handler.  An
> issue in Jira makes that work easier to track, and would also get your
> name in the changelog.
>
> Thanks,
> Shawn
>
>


Re: solr backup location 8.11.1

2022-08-05 Thread Gus Heck
Just looked at some other handler configurations, I think you may suffer
from a typo... should


  /var/i8s/backup/solr/${i8s.environment}/${
solr.core.name}


have been


  /var/i8s/backup/solr/${i8s.environment}/${
solr.core.name}

(note the s)

On Fri, Aug 5, 2022 at 1:05 PM Thomas Woodard  wrote:

> Thanks for the rapid replies. I've opened
> https://issues.apache.org/jira/browse/SOLR-16326 and will proceed with
> scripting a scheduled backup instead.
>
> On Fri, Aug 5, 2022 at 11:36 AM Shawn Heisey  wrote:
>
> > On 8/5/22 10:06, Thomas Woodard wrote:
> > > That is exactly what I was afraid of. Not being able to configure where
> > > automated backups go seems like a pretty major oversight, though. Is
> > anyone
> > > aware of a solution other than creating a bunch of soft links?
> >
> > The symlink idea I had (but haven't mentioned) would work pretty well if
> > you were calling http://server:port/solr/CORE/replication with a script
> > or manually, but not for the triggered backups. Maybe in the meantime
> > you can switch to a scheduled script and provide location and name
> > params on the URL instead of configuring backupAfter.  Then you could do
> > anything you want to do and won't have to compile it yourself or wait
> > for a new version.
> >
> > FYI, if your index is not very small, you should probably not be
> > optimizing it frequently.  If the optimizes are not frequent, or an
> > optimize completes very quickly, then ignore that.
> >
> > Please open an enhancement issue in the Apache Jira on the SOLR
> > project.  You are right that the location should be configurable as well
> > as something that can be provided on the URL.  I think we need to take a
> > close look at all the  parameters for the replication handler and decide
> > which ones should be configurable in solrconfig.xml.
> >
> > When I have some free time I will look into improving the handler.  An
> > issue in Jira makes that work easier to track, and would also get your
> > name in the changelog.
> >
> > Thanks,
> > Shawn
> >
> >
>


-- 
http://www.needhamsoftware.com (work)
http://www.the111shift.com (play)


Re: solr backup location 8.11.1

2022-08-05 Thread Thomas Woodard
Yup, I absolutely did typo when I tried to do it as a default. I'll update
my issue to correct that.

On Fri, Aug 5, 2022 at 12:31 PM Gus Heck  wrote:

> Just looked at some other handler configurations, I think you may suffer
> from a typo... should
>
> 
>   /var/i8s/backup/solr/${i8s.environment}/${
> solr.core.name}
> 
>
> have been
>
> 
>   /var/i8s/backup/solr/${i8s.environment}/${
> solr.core.name}
> 
> (note the s)
>
> On Fri, Aug 5, 2022 at 1:05 PM Thomas Woodard  wrote:
>
> > Thanks for the rapid replies. I've opened
> > https://issues.apache.org/jira/browse/SOLR-16326 and will proceed with
> > scripting a scheduled backup instead.
> >
> > On Fri, Aug 5, 2022 at 11:36 AM Shawn Heisey 
> wrote:
> >
> > > On 8/5/22 10:06, Thomas Woodard wrote:
> > > > That is exactly what I was afraid of. Not being able to configure
> where
> > > > automated backups go seems like a pretty major oversight, though. Is
> > > anyone
> > > > aware of a solution other than creating a bunch of soft links?
> > >
> > > The symlink idea I had (but haven't mentioned) would work pretty well
> if
> > > you were calling http://server:port/solr/CORE/replication with a
> script
> > > or manually, but not for the triggered backups. Maybe in the meantime
> > > you can switch to a scheduled script and provide location and name
> > > params on the URL instead of configuring backupAfter.  Then you could
> do
> > > anything you want to do and won't have to compile it yourself or wait
> > > for a new version.
> > >
> > > FYI, if your index is not very small, you should probably not be
> > > optimizing it frequently.  If the optimizes are not frequent, or an
> > > optimize completes very quickly, then ignore that.
> > >
> > > Please open an enhancement issue in the Apache Jira on the SOLR
> > > project.  You are right that the location should be configurable as
> well
> > > as something that can be provided on the URL.  I think we need to take
> a
> > > close look at all the  parameters for the replication handler and
> decide
> > > which ones should be configurable in solrconfig.xml.
> > >
> > > When I have some free time I will look into improving the handler.  An
> > > issue in Jira makes that work easier to track, and would also get your
> > > name in the changelog.
> > >
> > > Thanks,
> > > Shawn
> > >
> > >
> >
>
>
> --
> http://www.needhamsoftware.com (work)
> http://www.the111shift.com (play)
>


Re: solr backup location 8.11.1

2022-08-05 Thread Shawn Heisey

On 8/5/22 11:56, Thomas Woodard wrote:

Yup, I absolutely did typo when I tried to do it as a default. I'll update
my issue to correct that.


It will be interesting to see whether fixing the typo makes it work.  
Sometimes the code is hard to decipher, and it is always possible that 
it does apply the defaults in the way you're expecting.


Thanks,
Shawn



Re: solr backup location 8.11.1

2022-08-05 Thread Gus Heck
If it doesn't apply the defaults that's the bug right there I think.

On Fri, Aug 5, 2022 at 2:10 PM Shawn Heisey  wrote:

> On 8/5/22 11:56, Thomas Woodard wrote:
> > Yup, I absolutely did typo when I tried to do it as a default. I'll
> update
> > my issue to correct that.
>
> It will be interesting to see whether fixing the typo makes it work.
> Sometimes the code is hard to decipher, and it is always possible that
> it does apply the defaults in the way you're expecting.
>
> Thanks,
> Shawn
>
>

-- 
http://www.needhamsoftware.com (work)
http://www.the111shift.com (play)


Solr 8.11 ( Queries taking longer to respond on startup)

2022-08-05 Thread Surya R
hi,

We recently migrated from solr 6.x to 8.11.  We reindexed the data on solr
8.11 binaries.

We have a master slave configuration. The indexing happens on the master (
LEADER)and replicates to the slaves ( follower)

We have around 8 cores on each server,  they are like 8 collections,
serving diff needs.

When the solr dameon is restarted, the cores do appear on the admin
console, but when a query is hit against the core immediately, We dont get
a response, it spins for like 20 seconds and then only after i see the
below message in the log, i get the results. Why is it taking longer in
solr 8.x to server the queries after a restart of solr

2022-08-04 06:17:10.764 INFO  (searcherExecutor-38-thread-1-processing-
*x:customer_core*) [   *x:customer_core*] o.a.s.c.SolrCore [
*x:customer_core*]  Registered new searcher autowarm time: 0 ms

This never happened in solr 6.x, when the machine was rebooted/solr process
restarted , the moment the core shows up , it was able to server queries.

in solr 8.11, looks like the behavior changed. Is there a config change in
solrconfig.xml that I need to tweak to make this behavior similar to solr
6.x ?


thank you
Aditya


Re: Solr 8.11 ( Queries taking longer to respond on startup)

2022-08-05 Thread Shawn Heisey

On 8/5/22 14:21, Surya R wrote:

When the solr dameon is restarted, the cores do appear on the admin
console, but when a query is hit against the core immediately, We dont get
a response, it spins for like 20 seconds and then only after i see the
below message in the log, i get the results. Why is it taking longer in
solr 8.x to server the queries after a restart of solr


I would be interested in seeing the entire solr.log file from server 
start to successful slow query, to see if there's anything in there that 
can say what Solr is doing during that time.


Here is an answer to an almost identical question that covers things I 
wouldn't have thought of:


https://stackoverflow.com/questions/68709357/why-first-occurrence-of-a-query-takes-more-time-than-the-following-on-solrcloud

Are you in SolrCloud mode?  If so, is there more than one Solr node in 
the cluster?  What are the sizes of those 8 cores?  Both on-disk size 
and document counts, deleted and max.  What are the total memory and 
process memory characteristics of the server? Getting a screenshot of 
the processes as described at the following URL could be helpful:


https://cwiki.apache.org/confluence/display/solr/solrperformanceproblems#SolrPerformanceProblems-Askingforhelponamemory/performanceissue

Note that the mailing list eats almost all attachments, so you can't 
attach the screenshot to an email.  You'll need to put it on a file 
sharing site and send us a URL to access it.


Usually upgrading two major versions yields better performance, not 
worse.  We need more information to provide an answer.


Thanks,
Shawn



Re: Solr update only if field differs

2022-08-05 Thread Vincenzo D'Amore
Unfortunately in my architecture I cannot rely on a database and on a
updated/created
time field. There is a potentially infinite stream of documents with a
possible huge amount of duplication.
So avoid the indexing of the duplicate documents (I suppose) should improve
the performance.

On Fri, 5 Aug 2022 at 01:10, Dave  wrote:

> ——
>
> At this point it would be interesting to see how this Processor would
> increase the indexing performance when you have many duplicates
>
> - when it comes to indexing performance with duplicates, there isn’t any
> difference than a new document. It’s mark as original destroyed, and new
> one replaces.  Update isn’t a real thing, and the first operation is pretty
> much a joke speed wise and the second is as fast as indexing, and solr will
> manage the segments as needed when it determines to do so.  Your best bet
> is to manage this code wise. Have an updated/created time field and when
> indexing only run on those that fits your automated schedule against such
> fields.  In a database this takes like 5 minutes to write into your
> indexer, and I can promise you will be faster than trying to use a built in
> solr operation to figure it out for you.
>
> If I’m wrong I would love to know, but indexing code logic will always be
> faster than relying on a built in server function for these sorts of
> things.
>
>
>
>
>
> > On Aug 4, 2022, at 6:41 PM, Vincenzo D'Amore  wrote:
> >
> >
> > At this point it would be interesting to see how this Processor would
> > increase the indexing performance when you have many duplicates
>
-- 
Vincenzo D'Amore


Optimize not working, Solr version 9.1.0-SNAPSHOT

2022-08-05 Thread Shawn Heisey
While looking into a problem described on the #solr slack channel, I 
tried to have Solr optimize my core. It seems to have completely ignored 
the command.  I am running 9.1.0-SNAPSHOT, compiled from branch_9x.


The user on slack also tried to optimize their index, running version 
8.11.2, and that system behaved the same as mine.


I'm running this Java version, installed as Ubuntu packages:

openjdk version "11.0.16" 2022-07-19
OpenJDK Runtime Environment (build 11.0.16+8-post-Ubuntu-0ubuntu120.04)
OpenJDK 64-Bit Server VM (build 11.0.16+8-post-Ubuntu-0ubuntu120.04, 
mixed mode, sharing)


The user on slack is running:
Oracle Corporation OpenJDK 64-Bit Server VM 11.0.13 11.0.13+8

I have switched to the Shenandoah GC, I am told that the slack user has 
not made any major tweaks to the Solr startup.  They are using the 
docker container, I installed using the included service installer script.


Instead of optimize, they were able to do expungeDeletes, which dropped 
their index from 32GB to 24GB.  My index is only 660MB.  I have no 
deleted docs.


I have been trying to unravel the code to see if maybe optimize has been 
completely disabled (as well as being removed from the admin UI) but 
have not able to figure out where to look.  I have an update processor 
chain defined, but it does not have the processor that ignores commits 
and optimizes:


  
    
    
    
    
  

Below is the last api call I used to try the optimize.  At first I was 
not including the commit option.  Adding it did not help:


curl -v 
"http://localhost:8983/solr/dovecot/update?optimize=true&commit=true";


Does anyone have any ideas why the optimize is not working? Am I doing 
the api call right?


Thanks,
Shawn



Re: Optimize not working, Solr version 9.1.0-SNAPSHOT

2022-08-05 Thread Rahul Goswami
I recently hit this problem on 8.11.1. It was a tiny test index with 2
segments. One of the segments *might* have been from 7.x.  I wanted to
optimize and rewrite the index into a single segment in 8.x . But optimize
didn’t work.

In this case though, before optimize, numDocs=maxDocs. So I thought that
might be the probable cause and probably lucene determines there is no
benefit to optimize. So I induced a delete by doing an atomic update on an
existing document. Then ran optimize with
commit=true&optimize=true&maxSegments=1. Worked this time.

Not sure if this is as designed when numDocs=maxDocs, but was definitely
strange at first.

-Rahul


On Fri, Aug 5, 2022 at 6:46 PM Shawn Heisey  wrote:

> While looking into a problem described on the #solr slack channel, I
> tried to have Solr optimize my core. It seems to have completely ignored
> the command.  I am running 9.1.0-SNAPSHOT, compiled from branch_9x.
>
> The user on slack also tried to optimize their index, running version
> 8.11.2, and that system behaved the same as mine.
>
> I'm running this Java version, installed as Ubuntu packages:
>
> openjdk version "11.0.16" 2022-07-19
> OpenJDK Runtime Environment (build 11.0.16+8-post-Ubuntu-0ubuntu120.04)
> OpenJDK 64-Bit Server VM (build 11.0.16+8-post-Ubuntu-0ubuntu120.04,
> mixed mode, sharing)
>
> The user on slack is running:
> Oracle Corporation OpenJDK 64-Bit Server VM 11.0.13 11.0.13+8
>
> I have switched to the Shenandoah GC, I am told that the slack user has
> not made any major tweaks to the Solr startup.  They are using the
> docker container, I installed using the included service installer script.
>
> Instead of optimize, they were able to do expungeDeletes, which dropped
> their index from 32GB to 24GB.  My index is only 660MB.  I have no
> deleted docs.
>
> I have been trying to unravel the code to see if maybe optimize has been
> completely disabled (as well as being removed from the admin UI) but
> have not able to figure out where to look.  I have an update processor
> chain defined, but it does not have the processor that ignores commits
> and optimizes:
>
>
>  
>  
>  
>  
>
>
> Below is the last api call I used to try the optimize.  At first I was
> not including the commit option.  Adding it did not help:
>
> curl -v
> "http://localhost:8983/solr/dovecot/update?optimize=true&commit=true";
>
> Does anyone have any ideas why the optimize is not working? Am I doing
> the api call right?
>
> Thanks,
> Shawn
>
>


Test with DMARC reject enabled

2022-08-05 Thread Shawn Heisey
I have just enabled DMARC rejection for my domain.  Hoping that messages 
to the list can still get through.




RE: IllegalArgumentException: Unknown directory

2022-08-05 Thread Oakley, Craig (NIH/NLM/NCBI) [C]
This error has happened again. Does anyone yet have any explanation or 
suggestion?

-Original Message-
From: Oakley, Craig (NIH/NLM/NCBI) [C] 
Sent: Monday, May 02, 2022 2:29 PM
To: users@solr.apache.org
Subject: Re: IllegalArgumentException: Unknown directory

This has happened several more times, and I have notice something else which 
might be a clue:

The problem happened over the weekend; and the first erroneous complaint about 
/data/solr/subportal1/run_sel_cache_shard1_replica_n3/data/snapshot_metadata 
(as though it did not exist) occurred 17 minutes after 
/data/solr/subportal1/run_sel_cache_shard1_replica_n3/data/replication.properties
 was updated. Last month, the problem occurred just 40 seconds after 
replication.properties was updated.

What known connections might there be between replication.properties and 
snapshot_metadata?

-Original Message-
From: matthew sporleder  
Sent: Monday, March 28, 2022 9:41 AM
To: users@solr.apache.org
Subject: Re: [EXTERNAL] Re: IllegalArgumentException: Unknown directory

CAUTION: This email originated from outside of the organization. Do not click 
links or open attachments unless you recognize the sender and are confident the 
content is safe.


My only other guess (and I mean wild guess) is that because the error is
for a *lock* that it is actually a timeout or similar locking error with a
bad error message.

On Fri, Mar 25, 2022 at 11:11 AM Oakley, Craig (NIH/NLM/NCBI) [C]
 wrote:

> Thanks for the quick reply
>
> grep -i -c OutOfMemory solr.log.20220* shows zero
>
> Nothing new in /var/log/dmesg since reboot of the host a couple weeks ago
>
> Let me know if you have any other suggestions
>
> Thanks again
>
> -Original Message-
> From: matthew sporleder 
> Sent: Friday, March 25, 2022 11:00 AM
> To: users@solr.apache.org
> Subject: [EXTERNAL] Re: IllegalArgumentException: Unknown directory
>
> CAUTION: This email originated from outside of the organization. Do not
> click links or open attachments unless you recognize the sender and are
> confident the content is safe.
>
>
> Is there an OOM anywhere in that log?  I've definitely seen java lose track
> of things during a slow-moving oom.
>
> Also definitely check dmesg for anything in that same timeframe if you
> still have the logs.
>
> On Fri, Mar 25, 2022 at 10:58 AM Oakley, Craig (NIH/NLM/NCBI) [C]
>  wrote:
>
> > I have a core which ceased responding either to select or to admin/core:
> > restarting the Solr instance resolved the problem, but I am wondering
> > whether there is some configuration which may need to be tweaked. Below
> is
> > a portion of solr.log from the time when the problem began. Please note
> > that the directory
> >
> /data/solr/subportal1/run_sel_cache_shard1_replica_n3/data/snapshot_metadata
> > does indeed exist
> >
> > 2022-03-25 08:47:58.526 INFO  (qtp1847637306-38437) [c:run_sel_index
> > s:shard1 r:core_node4 x:run_sel_index_shard1_replica_n3]
> o.a.s.c.S.Request
> > [run_sel_index_shard1_replica_n3]  webapp=/solr path=/select
> >
> params={q={!join+from%3Dacc_ref+to%3Dacc_s+fromIndex%3Drun_sel_cache}list_guid:6ccb6d6731f557a9fd3edb34ad637add&facet.limit=2&facet.field=datastore_provider_ss&facet.field=datastore_region_ss&facet.field=datastore_filetype_ss&facet.field=acc_s&facet.field=sra_study_s&facet.field=experiment_s&facet.field=bioproject_s&facet.field=biosample_s&facet.field=sample_acc_s&facet.field=sra_sample_s&facet.field=consent_s&facet.field=gap_accession_sam_ss&facet.field=libraryselection_s&facet.field=librarysource_s&facet.field=librarylayout_s&facet.field=platform_s&facet.field=submission_id_s&facet.field=assemblyname_s&facet.field=submission_id_run_s&facet.field=instrument_s&facet.field=bytes_l&facet.field=bases_l&facet.field=mbytes_l&facet.field=mbases_l&start=0&facet.mincount=1&rows=0&wt=json&facet=on}
> > hits=94 status=0 QTime=2610
> > 2022-03-25 08:47:58.526 INFO  (qtp1847637306-38437) [c:run_sel_index
> > s:shard1 r:core_node4 x:run_sel_index_shard1_replica_n3] o.a.s.c.SolrCore
> > [run_sel_cache_shard1_replica_n3]  CLOSING SolrCore
> > org.apache.solr.core.SolrCore@59c22833 > org.apache.solr.core.SolrCore@59c22833>
> > 2022-03-25 08:47:58.527 INFO  (qtp1847637306-38437) [c:run_sel_index
> > s:shard1 r:core_node4 x:run_sel_index_shard1_replica_n3]
> > o.a.s.m.SolrMetricManager Closing metric reporters for
> > registry=solr.core.run_sel_cache.shard1.replica_n3, tag=SolrCore@59c22833
> > 2022-03-25 08:47:58.531 INFO  (qtp1847637306-38437) [c:run_sel_index
> > s:shard1 r:core_node4 x:run_sel_index_shard1_replica_n3]
> > o.a.s.m.r.SolrJmxReporter Closing reporter
> > [org.apache.solr.metrics.reporters.SolrJmxReporter@2b7f5d5e: rootName =
> > null, domain = solr.core.run_sel_cache.shard1.replica_n3, service url =
> > null, agent id = null] for registry
> > solr.core.run_sel_cache.shard1.replica_n3 /
> > com.codahale.metrics.MetricRegistry@5211d667 > com.codahale.metrics.MetricRegistry@5211d667>
> > 2022-03-25 08:47:58.568 INFO  (qtp