Re: Schema change without restart Solr 6.1.0

2021-11-23 Thread Tulsi Das
Yes, Restart is not needed for schema changes apart from some configuration
changes, such as the DataDir location and IndexWriter related settings in
solrconfig.xml which can not be changed and made active with a simple
RELOAD action.

Note :you might have to reindex the data in your collection after the
reload based on what changes you make to schema.xml or solrconfig.xml

Did you get any error while uploading or reloading the collection.

ideally once the config upload is done , you can validate if your changes
are seen in solr admin UI and then for those changes to be in action you
can reload the collection.



On Tue, Nov 23, 2021 at 12:48 PM VIshal Patel 
wrote:

> Thanks for reply.
>
> We are using Solr cloud mode[used zookeeper]. Generally, when we change in
> schema or solrconfig then first we do upconfig command and then restart the
> solr.
>
> We have used classic schema factory.
> below line in our solrconfig.xml
> 
> Is it not necessary to restart Solr server for ClassicIndexSchemaFactory?
> Will it work after reload the collection?
>
> Regards,
> VIshal
> 
> From: Shawn Heisey 
> Sent: Monday, November 22, 2021 10:05 PM
> To: users@solr.apache.org 
> Subject: Re: Schema change without restart Solr 6.1.0
>
> On 11/22/21 2:24 AM, VIshal Patel wrote:
> > I am using Solr 6.1.0. In production environment I want to do schema
> change without restarting Solr.
> > I have tried with
> http://1.1.1.1:7983/solr/admin/cores?action=RELOAD&core=forms but It did
> not work.
> > (
> https://stackoverflow.com/questions/23782123/how-do-i-change-schema-xml-without-restarting-solr
> )
> >
> > Our directory like this
> > \server\solr\forms\conf\schema.xml
> > \server\solr\forms\conf\solrconfig.xml
> >
> > Can it possible schema change without Solr restart?
>
> Reloading the index SHOULD activate schema changes without a full restart.
>
> Is Solr in cloud mode (solr+zookeeper)?  If it is, then as Tulsi Das
> mentioned, you have to change the config in zookeeper, not the one on
> the disk.  The name of your core would suggest that you're NOT in cloud
> mode, but I cannot be sure about that.
>
> Whether you're in cloud mode or not... are you using the classic schema
> factory in solrconfig.xml?  If not, then the name of the active schema
> file is most likely managed-schema (with no extension) rather than
> schema.xml.
>
> Thanks,
> Shawn
>
>
>


Tesaract.exe

2021-11-23 Thread Philipp Wawrina
How many processes of Teseract.exe should run simuntaniosly ? How can i 
configure that?
br

Gesendet von Mail für Windows



Re: Tesaract.exe

2021-11-23 Thread Jan Høydahl
Hi,

I assume you are using SolrCell /update/extract to process images and extract 
text using Teseract OCR.
However, teseract.exe is not part of Solr, so I believe your question should be 
directed towards either the Teseract community or perhaps Tika, or whoever 
provided you with the application you are running.

Jan

> 23. nov. 2021 kl. 11:57 skrev Philipp Wawrina :
> 
> How many processes of Teseract.exe should run simuntaniosly ? How can i 
> configure that?
> br
> 
> Gesendet von Mail für Windows
> 



Enabling s3 repository clusterwide?

2021-11-23 Thread Michael Conrad

Hello all,

I'm trying to setup backup to S3, and see that the plugin must be enabled.

The example XML fragments appears to be collection level specific, but I 
want to enable the plugin globally across all nodes at once.


What do I need to specify where?

So far, I've only found information which is for collection level 
configuration.


-Mike/NewsRx

EndOfStreamException: Unable to read additional data from server..., likely server has closed socket

2021-11-23 Thread Michael Conrad
Since our upgrade to 8.11.0 we've been getting a lot of warnings with 
"likely server has closed socket" as part of the description. What do I 
need to look for in our system configs that would be causing this?


WARN  - 2021-11-23 09:47:49.997; org.apache.zookeeper.ClientCnxn; An 
exception was thrown while closing send thread for session 
0x11bef34236f. => EndOfStreamException: Unable to read additional 
data from server sessionid 0x11bef34236f, likely server has closed 
socket
    at 
org.apache.zookeeper.ClientCnxnSocketNIO.doIO(ClientCnxnSocketNIO.java:77)
org.apache.zookeeper.ClientCnxn$EndOfStreamException: Unable to read 
additional data from server sessionid 0x11bef34236f, likely server 
has closed socket
    at 
org.apache.zookeeper.ClientCnxnSocketNIO.doIO(ClientCnxnSocketNIO.java:77) 
~[zookeeper-3.6.2.jar:3.6.2]
    at 
org.apache.zookeeper.ClientCnxnSocketNIO.doTransport(ClientCnxnSocketNIO.java:350) 
~[zookeeper-3.6.2.jar:3.6.2]
    at 
org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:1275) 
~[zookeeper-3.6.2.jar:3.6.2]



Help appreciated,
-mike/NewsRx

Searcher and autoSoftCommits + softCommit

2021-11-23 Thread Dan Rosher
Hi,

It seems to me that false is not being
honored, or does a softCommit always happen on an autoCommit?

Cache reloads seems to coincide with solr.autoCommit.maxTime:

We have the following solrconfig snippet:


  ${solr.autoCommit.maxTime:15000}
  false



  ${solr.autoSoftCommit.maxTime:-1}



I see the following in the logs:

021-11-23 15:10:38.976 INFO  (commitScheduler-643-thread-1) [   ]
o.a.s.u.DirectUpdateHandler2 start
commit{,optimize=false,openSearcher=true,waitSearcher=true,expungeDeletes=false,softCommit=true,prepareCommit=false}
2021-11-23 15:10:39.016 INFO  (commitScheduler-643-thread-1) [   ]
o.a.s.s.SolrIndexSearcher Opening [Searcher@40ed009c[jobs-2021-11-2
3_shard1_replica_n1] main]
2021-11-23 15:10:39.016 INFO  (commitScheduler-643-thread-1) [   ]
o.a.s.u.DirectUpdateHandler2 end_commit_flush
2021-11-23 15:10:45.079 INFO  (commitScheduler-642-thread-1) [   ]
o.a.s.u.DirectUpdateHandler2 start commit{,optimize=false,openSearc
her=false,waitSearcher=true,expungeDeletes=false,softCommit=false,prepareCommit=false}
2021-11-23 15:10:45.079 INFO  (commitScheduler-642-thread-1) [   ]
o.a.s.u.SolrIndexWriter Calling setCommitData with IW:org.apache.so
lr.update.SolrIndexWriter@5c52f993 commitCommandVersion:0
2021-11-23 15:10:45.396 INFO  (commitScheduler-642-thread-1) [   ]
o.a.s.s.SolrIndexSearcher Opening [Searcher@6e86071d[jobs-2021-11-2
3_shard1_replica_n1] realtime]
2021-11-23 15:10:45.397 INFO  (commitScheduler-642-thread-1) [   ]
o.a.s.u.DirectUpdateHandler2 end_commit_flush

So
commitScheduler-643-thread-1 seems to set openSearcher=true and
softCommit=true for 'main' and (main is the REAL client facing searcher?)
commitScheduler-642-thread-1 seems to set openSearcher=false and
softCommit=false for 'realtime'  (I think realtime is for bookeeping?)

Many thanks in advance,
Dan


Re: Searcher and autoSoftCommits + softCommit

2021-11-23 Thread Shawn Heisey

On 11/23/21 8:31 AM, Dan Rosher wrote:

It seems to me that false is not being
honored, or does a softCommit always happen on an autoCommit?

Cache reloads seems to coincide with solr.autoCommit.maxTime:

We have the following solrconfig snippet:

 
   ${solr.autoCommit.maxTime:15000}
   false
 

 
   ${solr.autoSoftCommit.maxTime:-1}
 


The autoCommit, if it fires, is always a hard commit.  If you see a soft 
commit, it did NOT come from autoCommit.



021-11-23 15:10:38.976 INFO  (commitScheduler-643-thread-1) [   ]
o.a.s.u.DirectUpdateHandler2 start
commit{,optimize=false,openSearcher=true,waitSearcher=true,expungeDeletes=false,softCommit=true,prepareCommit=false}


This is a soft commit.  It was NOT fired by autoCommit.  It looks like 
it was either fired by autoSoftCommit or by a commitWithin parameter on 
an update request.  If you have not defined the system property named 
"solr.autoSoftCommit.maxTime" when Solr starts, then I think it was most 
likely commitWithin.  Note that if you use the "Documents" tab in the 
admin UI to send documents for indexing, it has a "Commit Within" box 
that defaults to 1 second, and the value entered there is sent as a 
commitWithin parameter.



2021-11-23 15:10:45.079 INFO  (commitScheduler-642-thread-1) [   ]
o.a.s.u.DirectUpdateHandler2 start commit{,optimize=false,openSearc
her=false,waitSearcher=true,expungeDeletes=false,softCommit=false,prepareCommit=false}



This is a hard commit, with openSearcher set to false.  This one was 
almost assuredly fired by autoCommit.



(I think realtime is for bookeeping?)



The realtime searcher is for the handler usually assigned the "/get" URL 
path, which only searches on the uniqueKey field, and can search 
uncommitted documents as well as commited documents. This is an implicit 
handler, which means that it exists even if it is not configured in 
solrconfig.xml.  I believe that handler searches uncommitted docs by 
looking at transaction logs, but I do not know that for sure.  No action 
or config is necessary for a realtime searcher to be opened - Solr 
handles all that automatically.  It is likely that when autoCommit 
fires, a brand new realtime searcher is opened -- because a hard commit 
will cause changes in the transaction logs, a searcher that searches 
them must be updated.


Thanks,
Shawn




solr-operator formatting of cpu resource

2021-11-23 Thread Joel Bernstein
As part of the collections operator we are working on we need to compare
the value of the collection's cpu resource with the value of the underlying
SolrCloud cpu resource. What we found is that the SolrCloud cpu resource is
being formatted by the Solr operator which makes it tricky to do this
comparison. I believe it's using the following go function to format the
cpu value. I just wanted to confirm that if we port this logic to java
we'll be able make this comparison.

// NewMilliQuantity returns a new Quantity representing the given
// value * 1/1000 in the given format. Note that BinarySI formatting
// will round fractional values, and will be changed to DecimalSI for
// values x where (-1 < x < 1) && (x != 0).
func NewMilliQuantity(value int64, format Format) *Quantity {
return &Quantity{
i:  int64Amount{value: value, scale: -3},
Format: format,
}
}





Joel Bernstein
http://joelsolr.blogspot.com/


Re: solr-operator formatting of cpu resource

2021-11-23 Thread Joel Bernstein
A little more testing has shown that the cpu value is only reformatted if
it's a multiple of 1000.

I'm not sure if the solr-operator can control this or if this part of the
go operator framework. But if it's controllable ideally we would probably
not reformat the cpu.


Joel Bernstein
http://joelsolr.blogspot.com/


On Tue, Nov 23, 2021 at 12:14 PM Joel Bernstein  wrote:

> As part of the collections operator we are working on we need to compare
> the value of the collection's cpu resource with the value of the underlying
> SolrCloud cpu resource. What we found is that the SolrCloud cpu resource is
> being formatted by the Solr operator which makes it tricky to do this
> comparison. I believe it's using the following go function to format the
> cpu value. I just wanted to confirm that if we port this logic to java
> we'll be able make this comparison.
>
> // NewMilliQuantity returns a new Quantity representing the given
> // value * 1/1000 in the given format. Note that BinarySI formatting
> // will round fractional values, and will be changed to DecimalSI for
> // values x where (-1 < x < 1) && (x != 0).
> func NewMilliQuantity(value int64, format Format) *Quantity {
>   return &Quantity{
>   i:  int64Amount{value: value, scale: -3},
>   Format: format,
>   }
> }
>
>
>
>
>
> Joel Bernstein
> http://joelsolr.blogspot.com/
>


Performance Implications of solr.ICUCollationField

2021-11-23 Thread Amrit Sarkar
Hi users,

I was looking at performance implications on sorting on text-based fields
(non-DocValues fields) and wondering how *solr.ICUCollationField* (
https://solr.apache.org/guide/8_8/language-analysis.html#unicode-collation)
introduces an issue with thread management and heap consumption (from
caches). Are they safe to use for high traffic scenarios or some other way
of handling multiple languages is recommended?

Amrit Sarkar
Engineer | Search and Kubernetes
https://seamadic.com/
Twitter https://twitter.com/sarkaramrit2
LinkedIn: https://www.linkedin.com/in/sarkaramrit2
Medium: https://medium.com/@sarkaramrit2


Re: solr-operator formatting of cpu resource

2021-11-23 Thread Joel Bernstein
The more I think about this issue the more I think the
Solr-Operator shouldn't be changing CR objects. My understanding of this is
that the operator takes it's direction from the CR object and not the other
way around. So for the operator to write back to a CR object, even with
equivalent values, breaks that contract.

I'll create an issue for this so we can discuss how to handle this.


Joel Bernstein
http://joelsolr.blogspot.com/


On Tue, Nov 23, 2021 at 1:13 PM Joel Bernstein  wrote:

> A little more testing has shown that the cpu value is only reformatted if
> it's a multiple of 1000.
>
> I'm not sure if the solr-operator can control this or if this part of the
> go operator framework. But if it's controllable ideally we would probably
> not reformat the cpu.
>
>
> Joel Bernstein
> http://joelsolr.blogspot.com/
>
>
> On Tue, Nov 23, 2021 at 12:14 PM Joel Bernstein 
> wrote:
>
>> As part of the collections operator we are working on we need to compare
>> the value of the collection's cpu resource with the value of the underlying
>> SolrCloud cpu resource. What we found is that the SolrCloud cpu resource is
>> being formatted by the Solr operator which makes it tricky to do this
>> comparison. I believe it's using the following go function to format the
>> cpu value. I just wanted to confirm that if we port this logic to java
>> we'll be able make this comparison.
>>
>> // NewMilliQuantity returns a new Quantity representing the given
>> // value * 1/1000 in the given format. Note that BinarySI formatting
>> // will round fractional values, and will be changed to DecimalSI for
>> // values x where (-1 < x < 1) && (x != 0).
>> func NewMilliQuantity(value int64, format Format) *Quantity {
>>  return &Quantity{
>>  i:  int64Amount{value: value, scale: -3},
>>  Format: format,
>>  }
>> }
>>
>>
>>
>>
>>
>> Joel Bernstein
>> http://joelsolr.blogspot.com/
>>
>


Re: solr-operator formatting of cpu resource

2021-11-23 Thread Joel Bernstein
I opened and closed an issue for this. It turns out this happens with
standard Deployments objects as well. So, it's something other than the
Solr-operator. Sorry for the noise.


Joel Bernstein
http://joelsolr.blogspot.com/


On Tue, Nov 23, 2021 at 2:17 PM Joel Bernstein  wrote:

> The more I think about this issue the more I think the
> Solr-Operator shouldn't be changing CR objects. My understanding of this is
> that the operator takes it's direction from the CR object and not the other
> way around. So for the operator to write back to a CR object, even with
> equivalent values, breaks that contract.
>
> I'll create an issue for this so we can discuss how to handle this.
>
>
> Joel Bernstein
> http://joelsolr.blogspot.com/
>
>
> On Tue, Nov 23, 2021 at 1:13 PM Joel Bernstein  wrote:
>
>> A little more testing has shown that the cpu value is only reformatted if
>> it's a multiple of 1000.
>>
>> I'm not sure if the solr-operator can control this or if this part of the
>> go operator framework. But if it's controllable ideally we would probably
>> not reformat the cpu.
>>
>>
>> Joel Bernstein
>> http://joelsolr.blogspot.com/
>>
>>
>> On Tue, Nov 23, 2021 at 12:14 PM Joel Bernstein 
>> wrote:
>>
>>> As part of the collections operator we are working on we need to compare
>>> the value of the collection's cpu resource with the value of the underlying
>>> SolrCloud cpu resource. What we found is that the SolrCloud cpu resource is
>>> being formatted by the Solr operator which makes it tricky to do this
>>> comparison. I believe it's using the following go function to format the
>>> cpu value. I just wanted to confirm that if we port this logic to java
>>> we'll be able make this comparison.
>>>
>>> // NewMilliQuantity returns a new Quantity representing the given
>>> // value * 1/1000 in the given format. Note that BinarySI formatting
>>> // will round fractional values, and will be changed to DecimalSI for
>>> // values x where (-1 < x < 1) && (x != 0).
>>> func NewMilliQuantity(value int64, format Format) *Quantity {
>>> return &Quantity{
>>> i:  int64Amount{value: value, scale: -3},
>>> Format: format,
>>> }
>>> }
>>>
>>>
>>>
>>>
>>>
>>> Joel Bernstein
>>> http://joelsolr.blogspot.com/
>>>
>>


Re: Number of cores per node

2021-11-23 Thread Rahul Goswami
Tulsi Das,
Thanks for pointing me to the links, but unfortunately these articles/JIRA
don't answer my question. Moreover they are over a decade old and much has
changed since in the Solr world :)
I would still like to know people's experiences handling a large number of
Solr cores on a single node and at what point do cracks start to appear
with a large number of loaded cores for their hardware configuration.
I am aware of the option of marking cores transient and the loadOnStartup
option, but these options have their own limitations.

Still looking forward to getting more inputs on this thread!

Thanks,
Rahul



On Thu, Nov 18, 2021 at 9:27 AM Tulsi Das  wrote:

> refer this as well :
>
> https://solr-user.lucene.apache.narkive.com/d7Y5M6Cl/max-no-of-solr-cores-supported-and-how-to-restrict-a-query-to-a-particular-core
>
>
> On Thu, Nov 18, 2021 at 7:54 PM Tulsi Das  wrote:
>
> > Probably you can take a look at this jira :
> > https://issues.apache.org/jira/browse/SOLR-1028 to get an idea.
> >
> > On Sat, Nov 13, 2021 at 1:29 AM Rahul Goswami 
> > wrote:
> >
> >> To clarify, I mean the number of Solr cores per JVM instance. How many
> >> Solr
> >> cores on one node is too many?
> >> As mentioned, it would be nice to get the numbers for a Solr instance
> >> running in standalone mode (aka non-SolrCloud), but if not, would also
> be
> >> interested to know about people's experience with SolrCloud in this
> >> regard.
> >>
> >> Thanks,
> >> Rahul
> >>
> >>
> >> On Fri, Nov 12, 2021 at 2:29 PM Tulsi Das 
> >> wrote:
> >>
> >> > Hi Rahul ,
> >> > Are you asking about the cores (number of replica's per node) or the
> cpu
> >> > cores?
> >> >
> >> > On Sat, Nov 13, 2021, 12:50 AM Walter Underwood <
> wun...@wunderwood.org>
> >> > wrote:
> >> >
> >> > > We’ve run on AWS instances with 72 CPUs. They all get used.
> >> Throughput is
> >> > > linear with the number of CPUs. You need enough free RAM to cache
> all
> >> of
> >> > > the index files in OS file buffers.
> >> > >
> >> > > The entire point of avoiding locking in the Lucene index is so that
> >> > > multiple threads can read it without contention. We made the same
> >> > decision
> >> > > in the Ultraseek index design 25 years ago.
> >> > >
> >> > > We don’t do any special JVM tuning. We use the config that Shawn
> >> Heisey
> >> > > recommended five years ago. We reacently increased the heap from 8
> GB
> >> to
> >> > 16
> >> > > GB.
> >> > >
> >> > > GC_TUNE=" \
> >> > > -XX:+UseG1GC \
> >> > > -XX:+ParallelRefProcEnabled \
> >> > > -XX:G1HeapRegionSize=8m \
> >> > > -XX:MaxGCPauseMillis=200 \
> >> > > -XX:+UseLargePages \
> >> > > -XX:+AggressiveOpts \
> >> > > "
> >> > >
> >> > > wunder
> >> > > Walter Underwood
> >> > > wun...@wunderwood.org
> >> > > http://observer.wunderwood.org/  (my blog)
> >> > >
> >> > > > On Nov 12, 2021, at 7:41 AM, Deepak Goel 
> wrote:
> >> > > >
> >> > > > My guess is (please note it is not a benchmark): you would need a
> >> lot
> >> > of
> >> > > > tuning to make Solr use 32 cpu cores per node. After 4 cpu cores,
> >> you
> >> > > would
> >> > > > have to start tuning Solr, JVM, your app (requirement), IOP'S.
> >> > > >
> >> > > > Deepak
> >> > > > "The greatness of a nation can be judged by the way its animals
> are
> >> > > treated
> >> > > > - Mahatma Gandhi"
> >> > > >
> >> > > > +91 73500 12833
> >> > > > deic...@gmail.com
> >> > > >
> >> > > > Facebook: https://www.facebook.com/deicool
> >> > > > LinkedIn: www.linkedin.com/in/deicool
> >> > > >
> >> > > > "Plant a Tree, Go Green"
> >> > > >
> >> > > > Make In India : http://www.makeinindia.com/home
> >> > > >
> >> > > >
> >> > > > On Fri, Nov 12, 2021 at 8:33 PM Rahul Goswami <
> >> rahul196...@gmail.com>
> >> > > wrote:
> >> > > >
> >> > > >> Hi,
> >> > > >> Does anyone have benchmarks on performance as the number of cores
> >> on a
> >> > > Solr
> >> > > >> node goes up? I am trying to get an idea about how many cores per
> >> node
> >> > > is
> >> > > >> too much. Assume 31 GB heap size, SSD disk and 32 CPU cores.
> >> > > >> Preferably non-SolrCloud (aka standalone), but even if you have
> >> > insights
> >> > > >> from SolrCloud that would be a good start.
> >> > > >> I am using Solr 7.7.2.
> >> > > >>
> >> > > >> Thanks,
> >> > > >> Rahul
> >> > > >>
> >> > >
> >> > >
> >> >
> >>
> >
>