Re: Solr connection refused error

2023-12-13 Thread Anuj Bhargava
Noticed that whenever large amount of data is being indexed it stops.

Configuration is -




*My jetty.xml file is* -


https


<$







Re: Java Upgrade process

2023-12-13 Thread Jim Morgan
I've had some success. Firstly, Shawn I did find update-java-alternatives
with my googling and what an eye-opener that was! I thought it would surely
solve the problem, but it unfortunately didn't.

So today, I tried again. Because I hadn't noticed anything in the solr logs
before, I decided to wipe the entire log directory, and see what appeared.
I found my clue in the previously unnoticed solr-8983-console.log
I'd been looking in solr.log and the various gc logs, but hadn't noticed
this one appearing. In fact it was the only one which was created. This
mentioned a deprecated setting in my solr.in.sh which I removed, ran again,
removed etc until it finally ran, waiting for the mandatory 180 seconds in
between each restart.
These are the things I removed before it worked
-XX:+PrintGCDetails
UseConcMarkSweepGC
PrintGCApplicationStoppedTime
PrintGCDateStamps
PrintGCTimeStamps
PrintHeapAtGC
PrintTenuringDistribution
UseParNewGC

So now it all seems to be chugging away on Java 11. Will probably leave it
on that for a while, until I have an urge to tangle with solr again.
Understand that java 13 is a no-no, as its not LTS.

Thanks for your suggestions and advice.

Jim

On Wed, Dec 13, 2023 at 2:06 AM Chris Hostetter 
wrote:

>
> That's the console output, where the start script is waiting for the port
> to come up.
>
> what does the actual solr log file (written by the java process) say?
>
>
> https://solr.apache.org/guide/solr/latest/deployment-guide/taking-solr-to-production.html#log-settings
>
> : Date: Tue, 12 Dec 2023 10:57:37 +0800
> : From: Jim Morgan 
> : Reply-To: users@solr.apache.org
> : To: users@solr.apache.org
> : Subject: Re: Java Upgrade process
> :
> : Fair point Chris, I should have included log info in the original post.
> : So in the solr logs, nothing at all. I think because solr doesn't get to
> : the point of starting.
> : In syslog I have
> : -
> : Dec 11 06:35:08 server solr[6822]: Waiting up to 180 seconds to see Solr
> : running on port 8983 [|]  #010#010#010#010#010#010 [/]
> :  #010#010#010#010#010#010 [|]  #010#010#010#010#010#010 [/]
> : (lots and lots of these removed)
> : 010#010#010 [-]  #010#010#010#010#010
> : Dec 11 06:35:08 server solr[6897]: Still not seeing Solr listening on
> 8983
> : after 180 seconds!
> : -
> : There was a warning about Ulimits, which doesn't seem to occur under
> Java 8:
> : Dec 11 06:30:32 solr-dev solr[6266]: *** [WARN] *** Your open file limit
> is
> : currently 1024.
> : Dec 11 06:30:32 solr-dev solr[6266]:  It should be set to 65000 to avoid
> : operational disruption.
> : Dec 11 06:30:32 solr-dev solr[6266]:  If you no longer wish to see this
> : warning, set SOLR_ULIMIT_CHECKS to false in your profile or solr.in.sh
> : But I set that in /etc/defaults/solr.in.sh and it didn't help.
> :
> : Jim
> :
> : On Tue, Dec 12, 2023 at 8:10 AM Chris Hostetter <
> hossman_luc...@fucit.org>
> : wrote:
> :
> : >
> : > can you be more specific about " It just waits and eventually fails
> after
> : > about 2 minutes." ... what message does it print to the consolr?  what
> : > does the solr log file say?
> : >
> : >
> : >
> : > : Date: Mon, 11 Dec 2023 11:36:15 +0800
> : > : From: Jim Morgan 
> : > : Reply-To: users@solr.apache.org
> : > : To: users@solr.apache.org
> : > : Subject: Java Upgrade process
> : > :
> : > : A few months back, I had a question about upgrading Solr. I was
> trying to
> : > : upgrade from 8.5 to 9.3. That introduced a few errors into our
> website
> : > : search that we were not able to resolve, so we've figured we'll just
> : > : upgrade to the latest 8.x version, ie. 8.11.
> : > :
> : > : We've successfully upgraded the underlying OS, and we're happy 8.11
> : > returns
> : > : the same search results as 8.5. So far so good.
> : > :
> : > : Now we're looking at java. Its currently running on openjdk-8. I've
> tried
> : > : upgrading to openjdk-11 and openjdk-13. The process, as this is
> Ubuntu is
> : > : basically.
> : > : apt install openjdk-13-jdk
> : > : update-alternatives --list java
> : > : update-alternatives --set java
> : > /usr/lib/jvm/java-13-openjdk-amd64/bin/java
> : > : On doing that, solr will fail to start. It just waits and eventually
> : > fails
> : > : after about 2 minutes. So I can do
> : > : update-alternatives --set java
> : > : /usr/lib/jvm/java-8-openjdk-amd64/jre/bin/java
> : > : To set it back to java 8 and it runs again.
> : > :
> : > : I've tried regenerating the init script with the new java, but the
> : > scripts
> : > : are identical. I've tried rebooting the server with the new java
> active
> : > in
> : > : case its an environment thing. But to no avail.
> : > :
> : > : Any ideas what's going on here?
> : > :
> : > : I notice java 8 is still getting updates, so maybe the path of least
> : > : resistance is to stay on that.
> : > :
> : > : Jim
> : > :
> : >
> : > -Hoss
> : > http://www.lucidworks.com/
> : >
> :
>
> -Hoss
> http://www.lucidworks.com/


Re: Solr connection refused error

2023-12-13 Thread Deepak Goel
On Wed, Dec 13, 2023 at 1:35 PM Anuj Bhargava  wrote:

> Noticed that whenever large amount of data is being indexed it stops.
>
>
What is the error it spits out?


> Configuration is -
>
>  driver="com.mysql.jdbc.Driver"
> batchSize="-1"
> autoReconnect="true"
> useSSL="false"
> socketTimeout="0"
> connectTimeout="0"
> encoding="UTF-8"
>
>
> url="jdbc:mysql://xxx.xxx.xx.xx:3306/globe-data?zeroDateTimeBehavior=convertToNull"
> user="globe-user"
> password="globe-password"/>
> 
>
> *My jetty.xml file is* -
>
> 
> https
>  default="8443" />
>  name="solr.jetty.output.buffer.size" default="32768" />
>  name="solr.jetty.output.aggregation.size" default="32768" /><$
>  name="solr.jetty.request.header.size" default="65536" />
>  name="solr.jetty.response.header.size" default="32768" />
>  name="solr.jetty.send.server.version" default="false" />
>  default="false" />
>  name="solr.jetty.header.cache.size" default="1024" />
>  name="solr.jetty.delayDispatchUntilContent" default="false" />
> 

Re: Java Upgrade process

2023-12-13 Thread Jim Morgan
So, just as a follow up, I'm going to post my BEFORE and AFTER solr.in.sh
configs in case I've removed anything vital. Any comments welcome, as they
don't really mean much to me.
=== BEFORE 
SOLR_HEAP="8000m"

GC_LOG_OPTS="-verbose:gc -XX:+PrintHeapAtGC -XX:+PrintGCDetails \
-XX:+PrintGCDateStamps -XX:+PrintGCTimeStamps
-XX:+PrintTenuringDistribution -XX:+PrintGCApplicationStoppedTime"

GC_TUNE="-XX:NewRatio=3 \
-XX:SurvivorRatio=4 \
-XX:TargetSurvivorRatio=90 \
-XX:MaxTenuringThreshold=8 \
-XX:+UseConcMarkSweepGC \
-XX:+UseParNewGC \
-XX:ConcGCThreads=4 -XX:ParallelGCThreads=4 \
-XX:+CMSScavengeBeforeRemark \
-XX:PretenureSizeThreshold=64m \
-XX:+UseCMSInitiatingOccupancyOnly \
-XX:CMSInitiatingOccupancyFraction=50 \
-XX:CMSMaxAbortablePrecleanTime=6000 \
-XX:+CMSParallelRemarkEnabled \
-XX:+ParallelRefProcEnabled"

ENABLE_REMOTE_JMX_OPTS="false"

SOLR_OPTS="$SOLR_OPTS -Xss256k"
SOLR_OPTS="$SOLR_OPTS -Dlog4j2.formatMsgNoLookups=true"
===
AFTER=
SOLR_HEAP="8000m"

GC_LOG_OPTS="-verbose:gc -Xlog:gc*"

GC_TUNE="-XX:NewRatio=3 \
-XX:SurvivorRatio=4 \
-XX:TargetSurvivorRatio=90 \
-XX:MaxTenuringThreshold=8 \
-XX:+UseG1GC \
-XX:ConcGCThreads=4 -XX:ParallelGCThreads=4 \
-XX:+CMSScavengeBeforeRemark \
-XX:PretenureSizeThreshold=64m \
-XX:+UseCMSInitiatingOccupancyOnly \
-XX:CMSInitiatingOccupancyFraction=50 \
-XX:CMSMaxAbortablePrecleanTime=6000 \
-XX:+CMSParallelRemarkEnabled \
-XX:+ParallelRefProcEnabled"

ENABLE_REMOTE_JMX_OPTS="false"

SOLR_OPTS="$SOLR_OPTS -Xss256k"
SOLR_OPTS="$SOLR_OPTS -Dlog4j2.formatMsgNoLookups=true"

===

On Wed, Dec 13, 2023 at 4:09 PM Jim Morgan  wrote:

> I've had some success. Firstly, Shawn I did find update-java-alternatives
> with my googling and what an eye-opener that was! I thought it would surely
> solve the problem, but it unfortunately didn't.
>
> So today, I tried again. Because I hadn't noticed anything in the solr
> logs before, I decided to wipe the entire log directory, and see what
> appeared. I found my clue in the previously unnoticed solr-8983-console.log
> I'd been looking in solr.log and the various gc logs, but hadn't noticed
> this one appearing. In fact it was the only one which was created. This
> mentioned a deprecated setting in my solr.in.sh which I removed, ran
> again, removed etc until it finally ran, waiting for the mandatory 180
> seconds in between each restart.
> These are the things I removed before it worked
> -XX:+PrintGCDetails
> UseConcMarkSweepGC
> PrintGCApplicationStoppedTime
> PrintGCDateStamps
> PrintGCTimeStamps
> PrintHeapAtGC
> PrintTenuringDistribution
> UseParNewGC
>
> So now it all seems to be chugging away on Java 11. Will probably leave it
> on that for a while, until I have an urge to tangle with solr again.
> Understand that java 13 is a no-no, as its not LTS.
>
> Thanks for your suggestions and advice.
>
> Jim
>
> On Wed, Dec 13, 2023 at 2:06 AM Chris Hostetter 
> wrote:
>
>>
>> That's the console output, where the start script is waiting for the port
>> to come up.
>>
>> what does the actual solr log file (written by the java process) say?
>>
>>
>> https://solr.apache.org/guide/solr/latest/deployment-guide/taking-solr-to-production.html#log-settings
>>
>> : Date: Tue, 12 Dec 2023 10:57:37 +0800
>> : From: Jim Morgan 
>> : Reply-To: users@solr.apache.org
>> : To: users@solr.apache.org
>> : Subject: Re: Java Upgrade process
>> :
>> : Fair point Chris, I should have included log info in the original post.
>> : So in the solr logs, nothing at all. I think because solr doesn't get to
>> : the point of starting.
>> : In syslog I have
>> : -
>> : Dec 11 06:35:08 server solr[6822]: Waiting up to 180 seconds to see Solr
>> : running on port 8983 [|]  #010#010#010#010#010#010 [/]
>> :  #010#010#010#010#010#010 [|]  #010#010#010#010#010#010 [/]
>> : (lots and lots of these removed)
>> : 010#010#010 [-]  #010#010#010#010#010
>> : Dec 11 06:35:08 server solr[6897]: Still not seeing Solr listening on
>> 8983
>> : after 180 seconds!
>> : -
>> : There was a warning about Ulimits, which doesn't seem to occur under
>> Java 8:
>> : Dec 11 06:30:32 solr-dev solr[6266]: *** [WARN] *** Your open file
>> limit is
>> : currently 1024.
>> : Dec 11 06:30:32 solr-dev solr[6266]:  It should be set to 65000 to avoid
>> : operational disruption.
>> : Dec 11 06:30:32 solr-dev solr[6266]:  If you no longer wish to see this
>> : warning, set SOLR_ULIMIT_CHECKS to false in your profile or solr.in.sh
>> : But I set that in /etc/defaults/solr.in.sh and it didn't help.
>> :
>> : Jim
>> :
>> : On Tue, Dec 12, 2023 at 8:10 AM Chris Hostetter <
>> hossman_luc...@fucit.org>
>> : wrote:
>> :
>> : >
>> : > can you be more specific about " It just waits and eventually fails
>> after
>> : > about 2 minutes." ... what message does it print to the consolr?  what
>> : > does the solr log file say?
>> : >
>> : >
>> : >
>> : > : Date: Mon, 11 Dec 2023 11:36:15 +0800
>> : > : From: Jim Morgan 
>> : > : Reply-To: users@solr.apa

Re: Solr connection refused error

2023-12-13 Thread Anuj Bhargava
12/13/2023, 11:04:12 AM ERROR false x:gt128_dga UpdateLog
java.io.EOFException
12/13/2023, 11:04:13 AM WARN false x:gt128_dga UpdateLog Log replay
finished. recoveryInfo=RecoveryInfo{adds=82717 deletes=0 deleteByQuery=0
errors=0 positionOfStart=0}

On Wed, 13 Dec 2023 at 13:38, Deepak Goel  wrote:

> On Wed, Dec 13, 2023 at 1:35 PM Anuj Bhargava  wrote:
>
> > Noticed that whenever large amount of data is being indexed it stops.
> >
> >
> What is the error it spits out?
>
>
> > Configuration is -
> >
> >  > driver="com.mysql.jdbc.Driver"
> > batchSize="-1"
> > autoReconnect="true"
> > useSSL="false"
> > socketTimeout="0"
> > connectTimeout="0"
> > encoding="UTF-8"
> >
> >
> >
> url="jdbc:mysql://xxx.xxx.xx.xx:3306/globe-data?zeroDateTimeBehavior=convertToNull"
> > user="globe-user"
> > password="globe-password"/>
> > 
> >
> > *My jetty.xml file is* -
> >
> > 
> > https
> >  > default="8443" />
> >  > name="solr.jetty.output.buffer.size" default="32768" />
> >  > name="solr.jetty.output.aggregation.size" default="32768" /><$
> >  > name="solr.jetty.request.header.size" default="65536" />
> >  > name="solr.jetty.response.header.size" default="32768" />
> >  > name="solr.jetty.send.server.version" default="false" />
> >  name="solr.jetty.send.date.header"
> > default="false" />
> >  > name="solr.jetty.header.cache.size" default="1024" />
> >  > name="solr.jetty.delayDispatchUntilContent" default="false" />
> > 

Re: Solr connection refused error

2023-12-13 Thread Deepak Goel
On Wed, Dec 13, 2023 at 1:51 PM Anuj Bhargava  wrote:

> 12/13/2023, 11:04:12 AM ERROR false x:gt128_dga UpdateLog
> java.io.EOFException
>

You are trying to update a file which does not exist. We need more details
about the file which you are trying to update.


> 12/13/2023, 11:04:13 AM WARN false x:gt128_dga UpdateLog Log replay
> finished. recoveryInfo=RecoveryInfo{adds=82717 deletes=0 deleteByQuery=0
> errors=0 positionOfStart=0}
>
> On Wed, 13 Dec 2023 at 13:38, Deepak Goel  wrote:
>
> > On Wed, Dec 13, 2023 at 1:35 PM Anuj Bhargava 
> wrote:
> >
> > > Noticed that whenever large amount of data is being indexed it stops.
> > >
> > >
> > What is the error it spits out?
> >
> >
> > > Configuration is -
> > >
> > >  > > driver="com.mysql.jdbc.Driver"
> > > batchSize="-1"
> > > autoReconnect="true"
> > > useSSL="false"
> > > socketTimeout="0"
> > > connectTimeout="0"
> > > encoding="UTF-8"
> > >
> > >
> > >
> >
> url="jdbc:mysql://xxx.xxx.xx.xx:3306/globe-data?zeroDateTimeBehavior=convertToNull"
> > > user="globe-user"
> > > password="globe-password"/>
> > > 
> > >
> > > *My jetty.xml file is* -
> > >
> > >  class="org.eclipse.jetty.server.HttpConfiguration">
> > > https
> > >  > > default="8443" />
> > >  > > name="solr.jetty.output.buffer.size" default="32768" />
> > >  > > name="solr.jetty.output.aggregation.size" default="32768" /><$
> > >  > > name="solr.jetty.request.header.size" default="65536" />
> > >  > > name="solr.jetty.response.header.size" default="32768" />
> > >  > > name="solr.jetty.send.server.version" default="false" />
> > >  > name="solr.jetty.send.date.header"
> > > default="false" />
> > >  > > name="solr.jetty.header.cache.size" default="1024" />
> > >  > > name="solr.jetty.delayDispatchUntilContent" default="false" />
> > > 

Re: Solr connection refused error

2023-12-13 Thread Anuj Bhargava
Last Modified:about 7 hours ago
Num Docs:803749
Max Doc:803749
Heap Memory Usage:271668
Deleted Docs:0
Version:3789
Segment Count:25
Current:

Instance
CWD:/opt/solr-8.11.2/server
Instance:/var/solr/data/gt128_dga
Data:/var/solr/data/gt128_dga/data
Index:/var/solr/data/gt128_dga/data/index
Impl:org.apache.solr.core.NRTCachingDirectoryFactory

On Wed, 13 Dec 2023 at 16:31, Deepak Goel  wrote:

> On Wed, Dec 13, 2023 at 1:51 PM Anuj Bhargava  wrote:
>
> > 12/13/2023, 11:04:12 AM ERROR false x:gt128_dga UpdateLog
> > java.io.EOFException
> >
>
> You are trying to update a file which does not exist. We need more details
> about the file which you are trying to update.
>
>
> > 12/13/2023, 11:04:13 AM WARN false x:gt128_dga UpdateLog Log replay
> > finished. recoveryInfo=RecoveryInfo{adds=82717 deletes=0 deleteByQuery=0
> > errors=0 positionOfStart=0}
> >
> > On Wed, 13 Dec 2023 at 13:38, Deepak Goel  wrote:
> >
> > > On Wed, Dec 13, 2023 at 1:35 PM Anuj Bhargava 
> > wrote:
> > >
> > > > Noticed that whenever large amount of data is being indexed it stops.
> > > >
> > > >
> > > What is the error it spits out?
> > >
> > >
> > > > Configuration is -
> > > >
> > > >  > > > driver="com.mysql.jdbc.Driver"
> > > > batchSize="-1"
> > > > autoReconnect="true"
> > > > useSSL="false"
> > > > socketTimeout="0"
> > > > connectTimeout="0"
> > > > encoding="UTF-8"
> > > >
> > > >
> > > >
> > >
> >
> url="jdbc:mysql://xxx.xxx.xx.xx:3306/globe-data?zeroDateTimeBehavior=convertToNull"
> > > > user="globe-user"
> > > > password="globe-password"/>
> > > > 
> > > >
> > > > *My jetty.xml file is* -
> > > >
> > > >  > class="org.eclipse.jetty.server.HttpConfiguration">
> > > > https
> > > >  > > > default="8443" />
> > > >  > > > name="solr.jetty.output.buffer.size" default="32768" />
> > > >  > > > name="solr.jetty.output.aggregation.size" default="32768" /><$
> > > >  > > > name="solr.jetty.request.header.size" default="65536" />
> > > >  > > > name="solr.jetty.response.header.size" default="32768" />
> > > >  > > > name="solr.jetty.send.server.version" default="false" />
> > > >  > > name="solr.jetty.send.date.header"
> > > > default="false" />
> > > >  > > > name="solr.jetty.header.cache.size" default="1024" />
> > > >  > > > name="solr.jetty.delayDispatchUntilContent" default="false" />
> > > > 

Request processing has stalled

2023-12-13 Thread HariBabu kuruva
Hi All,

We are facing the below error in our PROD environment.
Could you please let me know what are all the causes for this issue? and
how to troubleshoot it.

We have 10 solr nodes.
Heap Memory: 30GB
Total VM Memory:60 GB
Solr Version: 8.11.1


*CmdDistributor java.io.IOException: Request processing has stalled for
20067ms with 100 remaining elements in the queue. => java.io
 .IOException: Request processing has stalled for 20067ms
with 100 remaining elements in the queue.*


-- 

Thanks and Regards,
 Hari
Mobile:9790756568


Idle timeout expired

2023-12-13 Thread HariBabu kuruva
Hi All,
We are seeing below errors in our solr logs. Could you please suggest me
what are all the causes for this issue and how to deal with it?


*ERROR:*
Caused by: java.util.concurrent.*TimeoutException: Idle timeout expired*:
12/12 ms
at
org.eclipse.jetty.io.IdleTimeout.checkIdleTimeout(IdleTimeout.java:171)
~[jetty-io-9.4.44.v20210927.jar:9.4.44.v20210927
]
at org.eclipse.jetty.io.IdleTimeout.idleCheck(IdleTimeout.java:113)
~[jetty-io-9.4.44.v20210927.jar:9.4.44.v20210927]
at
java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:515)
~[?:?]
at java.util.concurrent.FutureTask.run(FutureTask.java:264) ~[?:?]
at
java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:304)
~[?:?]
at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128)
~[?:?]
at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628)
~[?:?]
... 1 more
*Suppressed: java.lang.Throwable: HttpInput idle timeout*
at org.eclipse.jetty.server.HttpInput.onIdleTim

-- 

Thanks and Regards,
 Hari
Mobile:9790756568


Solr Query Alerting

2023-12-13 Thread Luke Kot-Zaniewski (BLOOMBERG/ 919 3RD A)
Hi All,

My team and I are trying to match documents against a set of queries. Both need 
to be updated in real-time (at least on the order of seconds) and the system 
needs to scale to potentially millions of queries. Although lucene monitor 
greatly optimizes this kind of alerting logic, as far as we know, it won't 
handle scaling as it isn't wrapped by a sophisticated runtime. 

Reading through this thread 
https://lists.apache.org/list.html?dev@find_parent=true pointed me at 
https://github.com/SOLR4189/solcolator. Overall, this project looks promising 
but I'm concerned that it was contributed by an anonymous user and is lacking 
basic maintenance, at the very least it needs dependency upgrades. My second 
concern with it is that it ingests _documents_ via AddUpdateCommand and then 
writes the output to a configurable sink. In my team's use case, it would be 
better to invert the flow and process documents in the SearchHandler (so that 
the output can be written to SolrQueryResponse) and AddUpdate the queries to 
take better advantage of solr's sharding. The second concern is not as strong 
as the first as I am still thinking about how to implement the "solcolator" 
within our constraints. Perhaps the document-as-query approach has its own 
drawbacks that I am not seeing.

In either case I have two questions:

1. Has anyone here used solcolator and with what effect? I would be grateful to 
learn specifics about the scale of your solution.
2. What do you think about forking or riffing off this project and adding a 
document-as-query control flow?

As it stands now we are designing a system that coordinates lucene monitor by 
directly interacting with zookeeper and it feels like we are reinventing some 
of solr's capabilities. Any feedback anyone might have would be greatly 
appreciated.

Many thanks,
Luke

Re: Solr Query Alerting

2023-12-13 Thread Charlie Hull

Hi Luke,

We built Luwak, the basis of Solcolator, at my previous company Flax, 
and donated it to Lucene. I've sent you a separate private email. I 
don't know who built Solcolator sadly, it would indeed be awesome to get 
this feature properly integrated into Solr.


Best

Charlie

On 13/12/2023 15:53, Luke Kot-Zaniewski (BLOOMBERG/ 919 3RD A) wrote:

Hi All,

My team and I are trying to match documents against a set of queries. Both need 
to be updated in real-time (at least on the order of seconds) and the system 
needs to scale to potentially millions of queries. Although lucene monitor 
greatly optimizes this kind of alerting logic, as far as we know, it won't 
handle scaling as it isn't wrapped by a sophisticated runtime.

Reading through this thread https://lists.apache.org/list.html?dev@find_parent=true 
pointed me at https://github.com/SOLR4189/solcolator. Overall, this project looks 
promising but I'm concerned that it was contributed by an anonymous user and is lacking 
basic maintenance, at the very least it needs dependency upgrades. My second concern with 
it is that it ingests _documents_ via AddUpdateCommand and then writes the output to a 
configurable sink. In my team's use case, it would be better to invert the flow and 
process documents in the SearchHandler (so that the output can be written to 
SolrQueryResponse) and AddUpdate the queries to take better advantage of solr's sharding. 
The second concern is not as strong as the first as I am still thinking about how to 
implement the "solcolator" within our constraints. Perhaps the 
document-as-query approach has its own drawbacks that I am not seeing.

In either case I have two questions:

1. Has anyone here used solcolator and with what effect? I would be grateful to 
learn specifics about the scale of your solution.
2. What do you think about forking or riffing off this project and adding a 
document-as-query control flow?

As it stands now we are designing a system that coordinates lucene monitor by 
directly interacting with zookeeper and it feels like we are reinventing some 
of solr's capabilities. Any feedback anyone might have would be greatly 
appreciated.

Many thanks,
Luke


--
Charlie Hull - Managing Consultant at OpenSource Connections Limited
Founding member of The Search Network and co-author of Searching the Enterprise
tel/fax: +44 (0)8700 118334
mobile: +44 (0)7767 825828

OpenSource Connections Europe GmbH | Pappelallee 78/79 | 10437 Berlin
Amtsgericht Charlottenburg | HRB 230712 B
Geschäftsführer: John M. Woodell | David E. Pugh
Finanzamt: Berlin Finanzamt für Körperschaften II



Re: Solr Query Alerting

2023-12-13 Thread Luke Kot-Zaniewski (BLOOMBERG/ 919 3RD A)
Hey Charlie,

Thanks for your private message. I am replying here as well for posterity .. I 
think these links you shared may be helpful for future searchers:

hackathon feature branch luwak-server 
https://github.com/flaxsearch/luwak/tree/luwak-server
Scott's blog post on the matter 
https://opensourceconnections.com/blog/2016/02/05/luwak/

I'll try to follow up on this thread with anything more concrete as we will be 
discussing internally.

Regards,
Luke


From: users@solr.apache.org At: 12/13/23 13:03:35 UTC-5:00To:  
users@solr.apache.org
Subject: Re: Solr Query Alerting

Hi Luke,

We built Luwak, the basis of Solcolator, at my previous company Flax, 
and donated it to Lucene. I've sent you a separate private email. I 
don't know who built Solcolator sadly, it would indeed be awesome to get 
this feature properly integrated into Solr.

Best

Charlie

On 13/12/2023 15:53, Luke Kot-Zaniewski (BLOOMBERG/ 919 3RD A) wrote:
> Hi All,
>
> My team and I are trying to match documents against a set of queries. Both 
need to be updated in real-time (at least on the order of seconds) and the 
system needs to scale to potentially millions of queries. Although lucene 
monitor greatly optimizes this kind of alerting logic, as far as we know, it 
won't handle scaling as it isn't wrapped by a sophisticated runtime.
>
> Reading through this thread 
https://lists.apache.org/list.html?dev@find_parent=true pointed me at 
https://github.com/SOLR4189/solcolator. Overall, this project looks promising 
but I'm concerned that it was contributed by an anonymous user and is lacking 
basic maintenance, at the very least it needs dependency upgrades. My second 
concern with it is that it ingests _documents_ via AddUpdateCommand and then 
writes the output to a configurable sink. In my team's use case, it would be 
better to invert the flow and process documents in the SearchHandler (so that 
the output can be written to SolrQueryResponse) and AddUpdate the queries to 
take better advantage of solr's sharding. The second concern is not as strong 
as the first as I am still thinking about how to implement the "solcolator" 
within our constraints. Perhaps the document-as-query approach has its own 
drawbacks that I am not seeing.
>
> In either case I have two questions:
>
> 1. Has anyone here used solcolator and with what effect? I would be grateful 
to learn specifics about the scale of your solution.
> 2. What do you think about forking or riffing off this project and adding a 
document-as-query control flow?
>
> As it stands now we are designing a system that coordinates lucene monitor by 
directly interacting with zookeeper and it feels like we are reinventing some 
of solr's capabilities. Any feedback anyone might have would be greatly 
appreciated.
>
> Many thanks,
> Luke

-- 
Charlie Hull - Managing Consultant at OpenSource Connections Limited
Founding member of The Search Network and co-author of Searching the Enterprise
tel/fax: +44 (0)8700 118334
mobile: +44 (0)7767 825828

OpenSource Connections Europe GmbH | Pappelallee 78/79 | 10437 Berlin
Amtsgericht Charlottenburg | HRB 230712 B
Geschäftsführer: John M. Woodell | David E. Pugh
Finanzamt: Berlin Finanzamt für Körperschaften II




Solr Indexing cronjob

2023-12-13 Thread Anuj Bhargava
Which of the following is correct or is there a better way?

curl -q 
http://user:passw...@xxx.xxx.xx.xxx:8983/solr/korea/dataimport?command=full-import&clean=true&commit=true
>/dev/null 2>&1

curl -qsS -u user:password "
http://xxx.xxx.xx.xxx:8983/solr/korea/dataimport?command=full-import&clean=true&commit=true";
> /dev/null

curl -qsS -u user:password "
http://xxx.xxx.xx.xxx:8983/solr/korea/dataimport?command=full-import&clean=true&commit=true";
> /dev/null  2>&1

Regards,

Anuj


Re: Solr Indexing cronjob

2023-12-13 Thread Jim Morgan
The first one will sometimes fail when there are unusual characters in the
password. I'd either use
-u "user:password" (in quotes)
or
-H "Authorization: Basic 98sdfsasdfasdkjwere"
where you're providing the hash of user and password as a Header.

The end of the command just deals with any output.
> /dev/null means no output is recorded, but you'll still get error
messages output
>/dev/null 2>&1 means both output and error messages will be stopped


On Thu, Dec 14, 2023 at 11:35 AM Anuj Bhargava  wrote:

> Which of the following is correct or is there a better way?
>
> curl -q http://user:passw...@xxx.xxx.xx.xxx
> :8983/solr/korea/dataimport?command=full-import&clean=true&commit=true
> >/dev/null 2>&1
>
> curl -qsS -u user:password "
>
> http://xxx.xxx.xx.xxx:8983/solr/korea/dataimport?command=full-import&clean=true&commit=true
> "
> > /dev/null
>
> curl -qsS -u user:password "
>
> http://xxx.xxx.xx.xxx:8983/solr/korea/dataimport?command=full-import&clean=true&commit=true
> "
> > /dev/null  2>&1
>
> Regards,
>
> Anuj
>


Re: Solr connection refused error

2023-12-13 Thread Anuj Bhargava
There are some files in /var/solr/logs. For example -
solr_oom_killer-8983-2023-12-14_05_30_42.log

and  this contains -

Running OOM killer script for process 2861377 for Solr on port 8983
Killed process 2861377

On Wed, 13 Dec 2023 at 17:36, Anuj Bhargava  wrote:

> Last Modified:about 7 hours ago
> Num Docs:803749
> Max Doc:803749
> Heap Memory Usage:271668
> Deleted Docs:0
> Version:3789
> Segment Count:25
> Current:
>
> Instance
> CWD:/opt/solr-8.11.2/server
> Instance:/var/solr/data/gt128_dga
> Data:/var/solr/data/gt128_dga/data
> Index:/var/solr/data/gt128_dga/data/index
> Impl:org.apache.solr.core.NRTCachingDirectoryFactory
>
> On Wed, 13 Dec 2023 at 16:31, Deepak Goel  wrote:
>
>> On Wed, Dec 13, 2023 at 1:51 PM Anuj Bhargava  wrote:
>>
>> > 12/13/2023, 11:04:12 AM ERROR false x:gt128_dga UpdateLog
>> > java.io.EOFException
>> >
>>
>> You are trying to update a file which does not exist. We need more details
>> about the file which you are trying to update.
>>
>>
>> > 12/13/2023, 11:04:13 AM WARN false x:gt128_dga UpdateLog Log replay
>> > finished. recoveryInfo=RecoveryInfo{adds=82717 deletes=0 deleteByQuery=0
>> > errors=0 positionOfStart=0}
>> >
>> > On Wed, 13 Dec 2023 at 13:38, Deepak Goel  wrote:
>> >
>> > > On Wed, Dec 13, 2023 at 1:35 PM Anuj Bhargava 
>> > wrote:
>> > >
>> > > > Noticed that whenever large amount of data is being indexed it
>> stops.
>> > > >
>> > > >
>> > > What is the error it spits out?
>> > >
>> > >
>> > > > Configuration is -
>> > > >
>> > > > > > > > driver="com.mysql.jdbc.Driver"
>> > > > batchSize="-1"
>> > > > autoReconnect="true"
>> > > > useSSL="false"
>> > > > socketTimeout="0"
>> > > > connectTimeout="0"
>> > > > encoding="UTF-8"
>> > > >
>> > > >
>> > > >
>> > >
>> >
>> url="jdbc:mysql://xxx.xxx.xx.xx:3306/globe-data?zeroDateTimeBehavior=convertToNull"
>> > > > user="globe-user"
>> > > > password="globe-password"/>
>> > > > 
>> > > >
>> > > > *My jetty.xml file is* -
>> > > >
>> > > > > > class="org.eclipse.jetty.server.HttpConfiguration">
>> > > > https
>> > > > > > > > default="8443" />
>> > > > > > > > name="solr.jetty.output.buffer.size" default="32768" />
>> > > > > > > > name="solr.jetty.output.aggregation.size" default="32768" /><$
>> > > > > > > > name="solr.jetty.request.header.size" default="65536" />
>> > > > > > > > name="solr.jetty.response.header.size" default="32768" />
>> > > > > > > > name="solr.jetty.send.server.version" default="false" />
>> > > > > > > name="solr.jetty.send.date.header"
>> > > > default="false" />
>> > > > > > > > name="solr.jetty.header.cache.size" default="1024" />
>> > > > > > > > name="solr.jetty.delayDispatchUntilContent" default="false" />
>> > > > 

Re: Solr connection refused error

2023-12-13 Thread Deepak Goel
On Thu, 14 Dec 2023, 10:11 Anuj Bhargava,  wrote:

> There are some files in /var/solr/logs. For example -
> solr_oom_killer-8983-2023-12-14_05_30_42.log
>
> and  this contains -
>
> Running OOM killer script for process 2861377 for Solr on port 8983
> Killed process 2861377
>

Looks  like you are running out of memory. Increase your heap size


> On Wed, 13 Dec 2023 at 17:36, Anuj Bhargava  wrote:
>
> > Last Modified:about 7 hours ago
> > Num Docs:803749
> > Max Doc:803749
> > Heap Memory Usage:271668
> > Deleted Docs:0
> > Version:3789
> > Segment Count:25
> > Current:
> >
> > Instance
> > CWD:/opt/solr-8.11.2/server
> > Instance:/var/solr/data/gt128_dga
> > Data:/var/solr/data/gt128_dga/data
> > Index:/var/solr/data/gt128_dga/data/index
> > Impl:org.apache.solr.core.NRTCachingDirectoryFactory
> >
> > On Wed, 13 Dec 2023 at 16:31, Deepak Goel  wrote:
> >
> >> On Wed, Dec 13, 2023 at 1:51 PM Anuj Bhargava 
> wrote:
> >>
> >> > 12/13/2023, 11:04:12 AM ERROR false x:gt128_dga UpdateLog
> >> > java.io.EOFException
> >> >
> >>
> >> You are trying to update a file which does not exist. We need more
> details
> >> about the file which you are trying to update.
> >>
> >>
> >> > 12/13/2023, 11:04:13 AM WARN false x:gt128_dga UpdateLog Log replay
> >> > finished. recoveryInfo=RecoveryInfo{adds=82717 deletes=0
> deleteByQuery=0
> >> > errors=0 positionOfStart=0}
> >> >
> >> > On Wed, 13 Dec 2023 at 13:38, Deepak Goel  wrote:
> >> >
> >> > > On Wed, Dec 13, 2023 at 1:35 PM Anuj Bhargava 
> >> > wrote:
> >> > >
> >> > > > Noticed that whenever large amount of data is being indexed it
> >> stops.
> >> > > >
> >> > > >
> >> > > What is the error it spits out?
> >> > >
> >> > >
> >> > > > Configuration is -
> >> > > >
> >> > > >  >> > > > driver="com.mysql.jdbc.Driver"
> >> > > > batchSize="-1"
> >> > > > autoReconnect="true"
> >> > > > useSSL="false"
> >> > > > socketTimeout="0"
> >> > > > connectTimeout="0"
> >> > > > encoding="UTF-8"
> >> > > >
> >> > > >
> >> > > >
> >> > >
> >> >
> >>
> url="jdbc:mysql://xxx.xxx.xx.xx:3306/globe-data?zeroDateTimeBehavior=convertToNull"
> >> > > > user="globe-user"
> >> > > > password="globe-password"/>
> >> > > > 
> >> > > >
> >> > > > *My jetty.xml file is* -
> >> > > >
> >> > > >  >> > class="org.eclipse.jetty.server.HttpConfiguration">
> >> > > > https
> >> > > >  >> > > > default="8443" />
> >> > > >  >> > > > name="solr.jetty.output.buffer.size" default="32768" />
> >> > > >  >> > > > name="solr.jetty.output.aggregation.size" default="32768" /><$
> >> > > >  >> > > > name="solr.jetty.request.header.size" default="65536" />
> >> > > >  >> > > > name="solr.jetty.response.header.size" default="32768" />
> >> > > >  >> > > > name="solr.jetty.send.server.version" default="false" />
> >> > > >  >> > > name="solr.jetty.send.date.header"
> >> > > > default="false" />
> >> > > >  >> > > > name="solr.jetty.header.cache.size" default="1024" />
> >> > > >  >> > > > name="solr.jetty.delayDispatchUntilContent" default="false"
> />
> >> > > > 

Re: Solr connection refused error

2023-12-13 Thread Anuj Bhargava
solr.in.sh has the following entry -
SOLR_JAVA_MEM="-Xms15g -Xmx15g"
SOLR_OPTS="$SOLR_OPTS -Dsolr.disable.shardsWhitelist=true"

In which file should I define SOLR_HEAP and what should be the ideal value.
My server has 128GB Ram and 2TB SSD

Regards,

Anuj

On Thu, 14 Dec 2023 at 10:56, Deepak Goel  wrote:

> On Thu, 14 Dec 2023, 10:11 Anuj Bhargava,  wrote:
>
> > There are some files in /var/solr/logs. For example -
> > solr_oom_killer-8983-2023-12-14_05_30_42.log
> >
> > and  this contains -
> >
> > Running OOM killer script for process 2861377 for Solr on port 8983
> > Killed process 2861377
> >
>
> Looks  like you are running out of memory. Increase your heap size
>
>
> > On Wed, 13 Dec 2023 at 17:36, Anuj Bhargava  wrote:
> >
> > > Last Modified:about 7 hours ago
> > > Num Docs:803749
> > > Max Doc:803749
> > > Heap Memory Usage:271668
> > > Deleted Docs:0
> > > Version:3789
> > > Segment Count:25
> > > Current:
> > >
> > > Instance
> > > CWD:/opt/solr-8.11.2/server
> > > Instance:/var/solr/data/gt128_dga
> > > Data:/var/solr/data/gt128_dga/data
> > > Index:/var/solr/data/gt128_dga/data/index
> > > Impl:org.apache.solr.core.NRTCachingDirectoryFactory
> > >
> > > On Wed, 13 Dec 2023 at 16:31, Deepak Goel  wrote:
> > >
> > >> On Wed, Dec 13, 2023 at 1:51 PM Anuj Bhargava 
> > wrote:
> > >>
> > >> > 12/13/2023, 11:04:12 AM ERROR false x:gt128_dga UpdateLog
> > >> > java.io.EOFException
> > >> >
> > >>
> > >> You are trying to update a file which does not exist. We need more
> > details
> > >> about the file which you are trying to update.
> > >>
> > >>
> > >> > 12/13/2023, 11:04:13 AM WARN false x:gt128_dga UpdateLog Log replay
> > >> > finished. recoveryInfo=RecoveryInfo{adds=82717 deletes=0
> > deleteByQuery=0
> > >> > errors=0 positionOfStart=0}
> > >> >
> > >> > On Wed, 13 Dec 2023 at 13:38, Deepak Goel 
> wrote:
> > >> >
> > >> > > On Wed, Dec 13, 2023 at 1:35 PM Anuj Bhargava  >
> > >> > wrote:
> > >> > >
> > >> > > > Noticed that whenever large amount of data is being indexed it
> > >> stops.
> > >> > > >
> > >> > > >
> > >> > > What is the error it spits out?
> > >> > >
> > >> > >
> > >> > > > Configuration is -
> > >> > > >
> > >> > > >  > >> > > > driver="com.mysql.jdbc.Driver"
> > >> > > > batchSize="-1"
> > >> > > > autoReconnect="true"
> > >> > > > useSSL="false"
> > >> > > > socketTimeout="0"
> > >> > > > connectTimeout="0"
> > >> > > > encoding="UTF-8"
> > >> > > >
> > >> > > >
> > >> > > >
> > >> > >
> > >> >
> > >>
> >
> url="jdbc:mysql://xxx.xxx.xx.xx:3306/globe-data?zeroDateTimeBehavior=convertToNull"
> > >> > > > user="globe-user"
> > >> > > > password="globe-password"/>
> > >> > > > 
> > >> > > >
> > >> > > > *My jetty.xml file is* -
> > >> > > >
> > >> > > >  > >> > class="org.eclipse.jetty.server.HttpConfiguration">
> > >> > > > https
> > >> > > >  name="solr.jetty.secure.port"
> > >> > > > default="8443" />
> > >> > > >  > >> > > > name="solr.jetty.output.buffer.size" default="32768" />
> > >> > > >  > >> > > > name="solr.jetty.output.aggregation.size" default="32768" /><$
> > >> > > >  > >> > > > name="solr.jetty.request.header.size" default="65536" />
> > >> > > >  > >> > > > name="solr.jetty.response.header.size" default="32768" />
> > >> > > >  > >> > > > name="solr.jetty.send.server.version" default="false" />
> > >> > > >  > >> > > name="solr.jetty.send.date.header"
> > >> > > > default="false" />
> > >> > > >  > >> > > > name="solr.jetty.header.cache.size" default="1024" />
> > >> > > >  > >> > > > name="solr.jetty.delayDispatchUntilContent" default="false"
> > />
> > >> > > > 

Re: Solr connection refused error

2023-12-13 Thread Deepak Goel
On Thu, Dec 14, 2023 at 11:40 AM Anuj Bhargava  wrote:

> solr.in.sh has the following entry -
> SOLR_JAVA_MEM="-Xms15g -Xmx15g"
> SOLR_OPTS="$SOLR_OPTS -Dsolr.disable.shardsWhitelist=true"
>

There could be a possible memory leak in your application. You need to
profile your application (possibly with a Memory-Debugger)


>
> In which file should I define SOLR_HEAP and what should be the ideal value.
> My server has 128GB Ram and 2TB SSD
>
> Regards,
>
> Anuj
>
> On Thu, 14 Dec 2023 at 10:56, Deepak Goel  wrote:
>
> > On Thu, 14 Dec 2023, 10:11 Anuj Bhargava,  wrote:
> >
> > > There are some files in /var/solr/logs. For example -
> > > solr_oom_killer-8983-2023-12-14_05_30_42.log
> > >
> > > and  this contains -
> > >
> > > Running OOM killer script for process 2861377 for Solr on port 8983
> > > Killed process 2861377
> > >
> >
> > Looks  like you are running out of memory. Increase your heap size
> >
> >
> > > On Wed, 13 Dec 2023 at 17:36, Anuj Bhargava 
> wrote:
> > >
> > > > Last Modified:about 7 hours ago
> > > > Num Docs:803749
> > > > Max Doc:803749
> > > > Heap Memory Usage:271668
> > > > Deleted Docs:0
> > > > Version:3789
> > > > Segment Count:25
> > > > Current:
> > > >
> > > > Instance
> > > > CWD:/opt/solr-8.11.2/server
> > > > Instance:/var/solr/data/gt128_dga
> > > > Data:/var/solr/data/gt128_dga/data
> > > > Index:/var/solr/data/gt128_dga/data/index
> > > > Impl:org.apache.solr.core.NRTCachingDirectoryFactory
> > > >
> > > > On Wed, 13 Dec 2023 at 16:31, Deepak Goel  wrote:
> > > >
> > > >> On Wed, Dec 13, 2023 at 1:51 PM Anuj Bhargava 
> > > wrote:
> > > >>
> > > >> > 12/13/2023, 11:04:12 AM ERROR false x:gt128_dga UpdateLog
> > > >> > java.io.EOFException
> > > >> >
> > > >>
> > > >> You are trying to update a file which does not exist. We need more
> > > details
> > > >> about the file which you are trying to update.
> > > >>
> > > >>
> > > >> > 12/13/2023, 11:04:13 AM WARN false x:gt128_dga UpdateLog Log
> replay
> > > >> > finished. recoveryInfo=RecoveryInfo{adds=82717 deletes=0
> > > deleteByQuery=0
> > > >> > errors=0 positionOfStart=0}
> > > >> >
> > > >> > On Wed, 13 Dec 2023 at 13:38, Deepak Goel 
> > wrote:
> > > >> >
> > > >> > > On Wed, Dec 13, 2023 at 1:35 PM Anuj Bhargava <
> anujb...@gmail.com
> > >
> > > >> > wrote:
> > > >> > >
> > > >> > > > Noticed that whenever large amount of data is being indexed it
> > > >> stops.
> > > >> > > >
> > > >> > > >
> > > >> > > What is the error it spits out?
> > > >> > >
> > > >> > >
> > > >> > > > Configuration is -
> > > >> > > >
> > > >> > > >  > > >> > > > driver="com.mysql.jdbc.Driver"
> > > >> > > > batchSize="-1"
> > > >> > > > autoReconnect="true"
> > > >> > > > useSSL="false"
> > > >> > > > socketTimeout="0"
> > > >> > > > connectTimeout="0"
> > > >> > > > encoding="UTF-8"
> > > >> > > >
> > > >> > > >
> > > >> > > >
> > > >> > >
> > > >> >
> > > >>
> > >
> >
> url="jdbc:mysql://xxx.xxx.xx.xx:3306/globe-data?zeroDateTimeBehavior=convertToNull"
> > > >> > > > user="globe-user"
> > > >> > > > password="globe-password"/>
> > > >> > > > 
> > > >> > > >
> > > >> > > > *My jetty.xml file is* -
> > > >> > > >
> > > >> > > >  > > >> > class="org.eclipse.jetty.server.HttpConfiguration">
> > > >> > > > https
> > > >> > > >  > name="solr.jetty.secure.port"
> > > >> > > > default="8443" />
> > > >> > > >  > > >> > > > name="solr.jetty.output.buffer.size" default="32768" />
> > > >> > > >  > > >> > > > name="solr.jetty.output.aggregation.size" default="32768" /><$
> > > >> > > >  > > >> > > > name="solr.jetty.request.header.size" default="65536" />
> > > >> > > >  > > >> > > > name="solr.jetty.response.header.size" default="32768"
> />
> > > >> > > >  > > >> > > > name="solr.jetty.send.server.version" default="false" />
> > > >> > > >  > > >> > > name="solr.jetty.send.date.header"
> > > >> > > > default="false" />
> > > >> > > >  > > >> > > > name="solr.jetty.header.cache.size" default="1024" />
> > > >> > > >  > > >> > > > name="solr.jetty.delayDispatchUntilContent" default="false"
> > > />
> > > >> > > >