Shards Parameter causing Infinite loop in solr cloud search

2022-10-10 Thread Satya Nand
Hi,
Good Morning.

 we have 8+1 solr nodes cluster. Where 1 Indexing node contains all(8) NRT
Primary shards. This is where all indexing happens. Then We have another 8
nodes consisting of one pull replica of each primary shard.

To limit the query on replicas we have done the following changes in
solrconfig and shard whitelisting.

 true
  
10.128.74.11:6086/solr/im-search,10.128.74.11:6087/solr/im-search

But after the changes, the requests are going in an infinite loop. I found
this in the documentation but I couldn't understand what is standard vs
nonstandard request handler is.

*"Do not add the shards parameter to the standard request handler; doing so
> may cause search queries may enter an infinite loop. Instead, define a new
> request handler that uses the shards parameter, and pass distributed search
> requests to that handler."*


so, How can we use shards parameters in solr config to limit the shards? I
think one alternative will be to pass the shards parameter in URL instead
of solrconfig.

But we would want to use the solrconfig to limit the changes in config only.


Re: Shards Parameter causing Infinite loop in solr cloud search

2022-10-10 Thread Shawn Heisey

On 10/10/22 01:57, Satya Nand wrote:


*"Do not add the shards parameter to the standard request handler; doing so
may cause search queries may enter an infinite loop. Instead, define a new
request handler that uses the shards parameter, and pass distributed search
requests to that handler."*


so, How can we use shards parameters in solr config to limit the shards? I
think one alternative will be to pass the shards parameter in URL instead
of solrconfig.

But we would want to use the solrconfig to limit the changes in config only.


The standard request handler is usually the one named "/select". You may 
want to add a new handler for this purpose.


Your message subject says you are in cloud mode.  If that is true, I 
think you are going to want to specify shards by name, not URL.  If you 
are in standalone mode (no zookeeper) then the way I handled that was to 
build a special core with an empty index that had a predefined list of 
shard URLs in the /select handler.  When I did that, I was using the 
"defaults" parameter config.  I think if I did it again I would use 
"invariants" so the user would not be able to override the list.


Thanks,
Shawn



Re: Shards Parameter causing Infinite loop in solr cloud search

2022-10-10 Thread Satya Nand
Hi Shawn,

>
> The standard request handler is usually the one named "/select". You may
> want to add a new handler for this purpose.


We are already using a custom request handler, Actually, there is no
/select handler in our solr config.


your message subject says you are in cloud mode.  If that is true, I
> think you are going to want to specify shards by name, not URL.


Yes, we are using solr cloud. The reason I don't want to specify the
shards' names is that a request can be sent to any replica of a shard based
on preference and availability but I specifically want to limit a request
to a PULL-type replica of a shard.

I am trying to replicate the behavior on this link,

https://solr.apache.org/guide/8_4/distributed-requests.html#limiting-which-shards-are-queried

This section

<
> Alternatively, you can specify a list of replicas you wish to use in place
> of a shard IDs by separating the replica IDs with commas:


http://localhost:8983/solr/gettingstarted/select?q=*:*&shards=localhost:7574/solr/gettingstarted,localhost:8983/solr/gettingstarted
>

But when I do this, my request goes in an infinite loop.

is there anything I can do to make it work? I just want to use some
specific set of replicas with shard.tolrent=true.





On Mon, Oct 10, 2022 at 5:07 PM Shawn Heisey  wrote:

> On 10/10/22 01:57, Satya Nand wrote:
>
> >> *"Do not add the shards parameter to the standard request handler;
> doing so
> >> may cause search queries may enter an infinite loop. Instead, define a
> new
> >> request handler that uses the shards parameter, and pass distributed
> search
> >> requests to that handler."*
> >
> > so, How can we use shards parameters in solr config to limit the shards?
> I
> > think one alternative will be to pass the shards parameter in URL instead
> > of solrconfig.
> >
> > But we would want to use the solrconfig to limit the changes in config
> only.
>
> The standard request handler is usually the one named "/select". You may
> want to add a new handler for this purpose.
>
> Your message subject says you are in cloud mode.  If that is true, I
> think you are going to want to specify shards by name, not URL.  If you
> are in standalone mode (no zookeeper) then the way I handled that was to
> build a special core with an empty index that had a predefined list of
> shard URLs in the /select handler.  When I did that, I was using the
> "defaults" parameter config.  I think if I did it again I would use
> "invariants" so the user would not be able to override the list.
>
> Thanks,
> Shawn
>
>
On Mon, Oct 10, 2022 at 5:07 PM Shawn Heisey  wrote:

> On 10/10/22 01:57, Satya Nand wrote:
>
> >> *"Do not add the shards parameter to the standard request handler;
> doing so
> >> may cause search queries may enter an infinite loop. Instead, define a
> new
> >> request handler that uses the shards parameter, and pass distributed
> search
> >> requests to that handler."*
> >
> > so, How can we use shards parameters in solr config to limit the shards?
> I
> > think one alternative will be to pass the shards parameter in URL instead
> > of solrconfig.
> >
> > But we would want to use the solrconfig to limit the changes in config
> only.
>
> The standard request handler is usually the one named "/select". You may
> want to add a new handler for this purpose.
>
> Your message subject says you are in cloud mode.  If that is true, I
> think you are going to want to specify shards by name, not URL.  If you
> are in standalone mode (no zookeeper) then the way I handled that was to
> build a special core with an empty index that had a predefined list of
> shard URLs in the /select handler.  When I did that, I was using the
> "defaults" parameter config.  I think if I did it again I would use
> "invariants" so the user would not be able to override the list.
>
> Thanks,
> Shawn
>
>


Re: Shards Parameter causing Infinite loop in solr cloud search

2022-10-10 Thread Shawn Heisey

On 10/10/22 06:00, Satya Nand wrote:

Yes, we are using solr cloud. The reason I don't want to specify the
shards' names is that a request can be sent to any replica of a shard based
on preference and availability but I specifically want to limit a request
to a PULL-type replica of a shard.

I am trying to replicate the behavior on this link,


This is a perfect use case for the shards.preference parameter. Use 
"shards.preference=replica.type:PULL" along with a list of shard names.  
If there is at least one PULL replica available for a shard, it will be 
used. It will only try other replica types as a last resort.


https://solr.apache.org/guide/8_4/distributed-requests.html#shards-preference-parameter

Thanks,
Shawn



Re: Shards Parameter causing Infinite loop in solr cloud search

2022-10-10 Thread Satya Nand
Shawn,
Actually we were using the preference parameter but recently we faced an
issue where 1 pull replica got down(due to gcp machine restart) and
requests started going to the NRT replica.
Machine hosting NRT replica is pretty weak.

That's why I was experimenting with with shards parameter with all the urls
of pull replicas. So request has no option to go to any other shards.
Also planning to use shards.tolrent so that in case one or more replica is
down , we can get the response from remaining replicas.

Based on the Documentation link I have posted, it  says that we can use but
i cam not able to make it work.

On Mon, 10 Oct, 2022, 6:14 pm Shawn Heisey,  wrote:

> On 10/10/22 06:00, Satya Nand wrote:
> > Yes, we are using solr cloud. The reason I don't want to specify the
> > shards' names is that a request can be sent to any replica of a shard
> based
> > on preference and availability but I specifically want to limit a request
> > to a PULL-type replica of a shard.
> >
> > I am trying to replicate the behavior on this link,
>
> This is a perfect use case for the shards.preference parameter. Use
> "shards.preference=replica.type:PULL" along with a list of shard names.
> If there is at least one PULL replica available for a shard, it will be
> used. It will only try other replica types as a last resort.
>
>
> https://solr.apache.org/guide/8_4/distributed-requests.html#shards-preference-parameter
>
> Thanks,
> Shawn
>
>


Re: Shards Parameter causing Infinite loop in solr cloud search

2022-10-10 Thread Shawn Heisey

On 10/10/22 06:58, Satya Nand wrote:

Actually we were using the preference parameter but recently we faced an
issue where 1 pull replica got down(due to gcp machine restart) and
requests started going to the NRT replica.
Machine hosting NRT replica is pretty weak.

That's why I was experimenting with with shards parameter with all the urls
of pull replicas. So request has no option to go to any other shards.
Also planning to use shards.tolrent so that in case one or more replica is
down , we can get the response from remaining replicas.


Some choices:

* Bump up the hardware hosting the NRT replicas so they can also handle 
queries.

* Add another set of PULL replicas on separate hardware.
* Adjust your systems so that each one hosts a PULL replica for two 
different shards.
* Rearrange things so that each system hosts an NRT replica for one 
shard and a PULL replica for a different shard.


Thanks,
Shawn



Re: Shards Parameter causing Infinite loop in solr cloud search

2022-10-10 Thread Satya Nand
Thanks Shawn for sharing all possibilities , we will try to evaluate all
these.



On Mon, 10 Oct, 2022, 6:45 pm Shawn Heisey,  wrote:

> On 10/10/22 06:58, Satya Nand wrote:
> > Actually we were using the preference parameter but recently we faced an
> > issue where 1 pull replica got down(due to gcp machine restart) and
> > requests started going to the NRT replica.
> > Machine hosting NRT replica is pretty weak.
> >
> > That's why I was experimenting with with shards parameter with all the
> urls
> > of pull replicas. So request has no option to go to any other shards.
> > Also planning to use shards.tolrent so that in case one or more replica
> is
> > down , we can get the response from remaining replicas.
>
> Some choices:
>
> * Bump up the hardware hosting the NRT replicas so they can also handle
> queries.
> * Add another set of PULL replicas on separate hardware.
> * Adjust your systems so that each one hosts a PULL replica for two
> different shards.
> * Rearrange things so that each system hosts an NRT replica for one
> shard and a PULL replica for a different shard.
>
> Thanks,
> Shawn
>
>


RE: Node backup using replication

2022-10-10 Thread Joe Jones (DHCW - Software Development)
Had Solr running as a Windows service under the generic 'system' account.
Tried running under a user account with elevated permissions AND also applying 
that account to have full security control over the folder in question and 
still see the same permission error.

Full error logged:

ERROR (qtp1327871893-22) [c:wcrs s:shard3 r:core_node10 
x:wcrs_shard3_replica_n9] o.a.s.h.ReplicationHandler Exception while creating a 
snapshot => java.security.AccessControlException: access denied 
("java.io.FilePermission" "D:\Solr\backup\node1" "read")
at 
java.base/java.security.AccessControlContext.checkPermission(AccessControlContext.java:485)
java.security.AccessControlException: access denied ("java.io.FilePermission" 
"D:\Solr\backup\node1" "read")
at 
java.security.AccessControlContext.checkPermission(AccessControlContext.java:485)
 ~[?:?]
at 
java.security.AccessController.checkPermission(AccessController.java:1068) 
~[?:?]
at java.lang.SecurityManager.checkPermission(SecurityManager.java:416) 
~[?:?]
at java.lang.SecurityManager.checkRead(SecurityManager.java:756) ~[?:?]
at 
sun.nio.fs.WindowsChannelFactory.open(WindowsChannelFactory.java:300) ~[?:?]
at 
sun.nio.fs.WindowsChannelFactory.newFileChannel(WindowsChannelFactory.java:168) 
~[?:?]
at 
sun.nio.fs.WindowsFileSystemProvider.checkReadAccess(WindowsFileSystemProvider.java:331)
 ~[?:?]
at 
sun.nio.fs.WindowsFileSystemProvider.checkAccess(WindowsFileSystemProvider.java:380)
 ~[?:?]
at 
sun.nio.fs.AbstractFileSystemProvider.exists(AbstractFileSystemProvider.java:151)
 ~[?:?]
at java.nio.file.Files.exists(Files.java:2522) ~[?:?]
at 
org.apache.solr.core.backup.repository.LocalFileSystemRepository.exists(LocalFileSystemRepository.java:110)
 ~[?:?]
at 
org.apache.solr.handler.SnapShooter.validateCreateSnapshot(SnapShooter.java:161)
 ~[?:?]
at 
org.apache.solr.handler.ReplicationHandler.doSnapShoot(ReplicationHandler.java:660)
 ~[?:?]
at 
org.apache.solr.handler.ReplicationHandler.handleRequestBody(ReplicationHandler.java:299)
 ~[?:?]
at 
org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:207)
 ~[?:?]
at org.apache.solr.core.SolrCore.execute(SolrCore.java:2866) ~[?:?]
at org.apache.solr.servlet.HttpSolrCall.execute(HttpSolrCall.java:887) 
~[?:?]
at org.apache.solr.servlet.HttpSolrCall.call(HttpSolrCall.java:606) 
~[?:?]
at 
org.apache.solr.servlet.SolrDispatchFilter.dispatch(SolrDispatchFilter.java:239)
 ~[?:?]
at 
org.apache.solr.servlet.SolrDispatchFilter.lambda$doFilter$0(SolrDispatchFilter.java:207)
 ~[?:?]
at 
org.apache.solr.servlet.ServletUtils.traceHttpRequestExecution2(ServletUtils.java:257)
 ~[?:?]
at 
org.apache.solr.servlet.ServletUtils.rateLimitRequest(ServletUtils.java:227) 
~[?:?]
at 
org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:202)
 ~[?:?]
at 
org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:184)
 ~[?:?]
at 
org.eclipse.jetty.servlet.FilterHolder.doFilter(FilterHolder.java:201) 
~[jetty-servlet-9.4.44.v20210927.jar:9.4.44.v20210927]
at 
org.eclipse.jetty.servlet.ServletHandler$Chain.doFilter(ServletHandler.java:1601)
 ~[jetty-servlet-9.4.44.v20210927.jar:9.4.44.v20210927]
at 
org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:548) 
~[jetty-servlet-9.4.44.v20210927.jar:9.4.44.v20210927]
at 
org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:143) 
~[jetty-server-9.4.44.v20210927.jar:9.4.44.v20210927]
at 
org.eclipse.jetty.security.SecurityHandler.handle(SecurityHandler.java:600) 
~[jetty-security-9.4.44.v20210927.jar:9.4.44.v20210927]
at 
org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:127) 
~[jetty-server-9.4.44.v20210927.jar:9.4.44.v20210927]
at 
org.eclipse.jetty.server.handler.ScopedHandler.nextHandle(ScopedHandler.java:235)
 ~[jetty-server-9.4.44.v20210927.jar:9.4.44.v20210927]
at 
org.eclipse.jetty.server.session.SessionHandler.doHandle(SessionHandler.java:1624)
 ~[jetty-server-9.4.44.v20210927.jar:9.4.44.v20210927]
at 
org.eclipse.jetty.server.handler.ScopedHandler.nextHandle(ScopedHandler.java:233)
 ~[jetty-server-9.4.44.v20210927.jar:9.4.44.v20210927]
at 
org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1434)
 ~[jetty-server-9.4.44.v20210927.jar:9.4.44.v20210927]
at 
org.eclipse.jetty.server.handler.ScopedHandler.nextScope(ScopedHandler.java:188)
 ~[jetty-server-9.4.44.v20210927.jar:9.4.44.v20210927]
at 
org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:501) 
~[jetty-servlet-9.4.44.v20210927.jar:9.4.44.v20210927]
at 
org.eclipse.jetty.server.session.SessionHandler.doScope(SessionHandler.java:1594)
 ~[jetty-server-9.4.44.v20210927.jar:9.4.4

Re: Node backup using replication

2022-10-10 Thread Shawn Heisey

On 10/10/22 09:23, Joe Jones (DHCW - Software Development) wrote:

java.security.AccessControlException: access denied ("java.io.FilePermission" 
"D:\Solr\backup\node1" "read")


This is saying that it failed to READ that directory.  I had expected to 
see a failure to WRITE.


Maybe that will be a clue that someone at your organization can use to 
figure out what's wrong.


Thanks,
Shawn



Re: Node backup using replication

2022-10-10 Thread Dave
Exactly. In linux I would just do a 777 for such a directory anyways since no 
one outside of the machine can get to it since no solr servers should have 
public ip.  

> On Oct 10, 2022, at 12:51 PM, Shawn Heisey  wrote:
> 
> On 10/10/22 09:23, Joe Jones (DHCW - Software Development) wrote:
>> java.security.AccessControlException: access denied 
>> ("java.io.FilePermission" "D:\Solr\backup\node1" "read")
> 
> This is saying that it failed to READ that directory.  I had expected to see 
> a failure to WRITE.
> 
> Maybe that will be a clue that someone at your organization can use to figure 
> out what's wrong.
> 
> Thanks,
> Shawn
> 


Solr 6 Replication question

2022-10-10 Thread mtn search
Hello,  I am learning more about replication as I maintain a large Solr 6
set of Solr servers configured for Master/Slave.

I noticed during some replication activities in addition to the original
index dir under the core name on the file system is a dir named "index"
with a timestamp.  index..  Files are written to this dir with
the timestamp during replication.  I am interested in how this works:

For every core replicating to it's master is this timestamped dir created?

Or is this timestamped dir created/used for only special circumstances?  If
so, what?

  - Are there cases that cause a full replication within Solr 6?

Is the original index dir removed and the time stamped dir renamed to
"index" after replication?

I initially figured all replication activities happened within the index
dir, but that does not appear to be the case.

Any tips, or documentation references would be appreciated.

Thanks,
Matt


Re: Solr 6 Replication question

2022-10-10 Thread mtn search
As I go back through
https://solr.apache.org/guide/6_6/index-replication.html, the picture is
filling in a little more.  My guess the tmp dir referenced, is the
index. dir.

Very interested in cases that might generate a full replication.  To my
knowledge no optimize commands has been issued against the core in question.

On Mon, Oct 10, 2022 at 12:38 PM mtn search  wrote:

> Hello,  I am learning more about replication as I maintain a large Solr 6
> set of Solr servers configured for Master/Slave.
>
> I noticed during some replication activities in addition to the original
> index dir under the core name on the file system is a dir named "index"
> with a timestamp.  index..  Files are written to this dir with
> the timestamp during replication.  I am interested in how this works:
>
> For every core replicating to it's master is this timestamped dir
> created?
>
> Or is this timestamped dir created/used for only special circumstances?
> If so, what?
>
>   - Are there cases that cause a full replication within Solr 6?
>
> Is the original index dir removed and the time stamped dir renamed to
> "index" after replication?
>
> I initially figured all replication activities happened within the index
> dir, but that does not appear to be the case.
>
> Any tips, or documentation references would be appreciated.
>
> Thanks,
> Matt
>


Re: Solr 6 Replication question

2022-10-10 Thread Dave
Only an optimize or a large fragment merge would cause a large file deposits 
there. That’s why “slaves” should always have double the index size available 
as solr will decide on its own when to merge or optimize on the master so the 
slaves need to be ready for double the size, and the master needs to be ready 
for triple the size.  If you don’t have the disk space ready to handle this 
you’re going to eventually run into some serious issues, or just not be able to 
replicate 

-dave

> On Oct 10, 2022, at 2:56 PM, mtn search  wrote:
> 
> As I go back through
> https://solr.apache.org/guide/6_6/index-replication.html, the picture is
> filling in a little more.  My guess the tmp dir referenced, is the
> index. dir.
> 
> Very interested in cases that might generate a full replication.  To my
> knowledge no optimize commands has been issued against the core in question.
> 
>> On Mon, Oct 10, 2022 at 12:38 PM mtn search  wrote:
>> 
>> Hello,  I am learning more about replication as I maintain a large Solr 6
>> set of Solr servers configured for Master/Slave.
>> 
>> I noticed during some replication activities in addition to the original
>> index dir under the core name on the file system is a dir named "index"
>> with a timestamp.  index..  Files are written to this dir with
>> the timestamp during replication.  I am interested in how this works:
>> 
>> For every core replicating to it's master is this timestamped dir
>> created?
>> 
>> Or is this timestamped dir created/used for only special circumstances?
>> If so, what?
>> 
>>  - Are there cases that cause a full replication within Solr 6?
>> 
>> Is the original index dir removed and the time stamped dir renamed to
>> "index" after replication?
>> 
>> I initially figured all replication activities happened within the index
>> dir, but that does not appear to be the case.
>> 
>> Any tips, or documentation references would be appreciated.
>> 
>> Thanks,
>> Matt
>> 


Re: Pull Interval in Pull Type replicas of Solr Cloud ?

2022-10-10 Thread Dominique Bejean
Hi,

On sematext blog, I read for TLOG replication Interval
The poll time from replica to the master is set to half of the autoCommit
property value or, if autoCommit is not defined, 50% of the autoSoftCommit.
If both are not present it is set to 1500 milliseconds.

No details for PULL replica but replication interval should be longuer than
for TLOG

Dominique




Le lun. 3 oct. 2022 à 14:36, Markus Jelsma  a
écrit :

> Yes, you could say that. A commit will cause a segment to be
> created/merged. Once the leader node finishes that proces, the mutations
> are send to the follower nodes. If you have a steady commit interval, then
> you should see a reasonably steady replication interval.
>
> Regards,
> Markus
>
> Op ma 3 okt. 2022 om 12:05 schreef Satya Nand
> :
>
> > Hi Markus,
> > thank you.
> > so in this case the Commit interval will become the polling interval?
> > frequent commit => frequent replication ?
> >
> >
> > On Mon, Oct 3, 2022 at 3:31 PM Markus Jelsma  >
> > wrote:
> >
> > > Hello Satya,
> > >
> > > There is no replication interval to define. The PULL of TLOG replicas
> > will
> > > pull new segment data from the current shard leader as they become
> > > available. No specific configuration is needed.
> > >
> > > Regards,
> > > Markus
> > >
> > > Op ma 3 okt. 2022 om 11:48 schreef Satya Nand
> > > :
> > >
> > > > Hi,
> > > >
> > > > Is there any configuration, where we can define the replication
> > interval
> > > > when pull replica should pull indexes from NRT replicas?
> > > >
> > >
> >
>


solr 9 standalone crashed after few hours - PhaseIdealLoop::build_loop_late_post_work

2022-10-10 Thread Jen-Ya Ku
Hi all,

We've deployed solr9 on OpenJDK 17 and it crashed after few hours with
following error:
# A fatal error has been detected by the Java Runtime Environment:
#
#  SIGSEGV (0xb) at pc=0x7f834389b332, pid=8997, tid=9025
#
# JRE version: OpenJDK Runtime Environment Microsoft-40354 (17.0.4.1+1)
(build 17.0.4.1+1-LTS)
# Java VM: OpenJDK 64-Bit Server VM Microsoft-40354 (17.0.4.1+1-LTS, mixed
mode, sharing, tiered, compressed oops, compressed class ptrs, g1 gc,
linux-amd64)

*# Problematic frame:# V  [libjvm.so+0xad1332]
 PhaseIdealLoop::build_loop_late_post_work(Node*, bool)+0xf2*
#
I've attached the full log with this email. Is it a known issue?

Thanks,
Jen-Ya


Re: solr 9 standalone crashed after few hours - PhaseIdealLoop::build_loop_late_post_work

2022-10-10 Thread Dave
I won’t say for certain as I have never seen this but this seems like a garbage 
collection situation. Look there first to see if you can cancel that out as the 
cause 

> On Oct 10, 2022, at 5:59 PM, Jen-Ya Ku  wrote:
> 
> 
> Hi all, 
> 
> We've deployed solr9 on OpenJDK 17 and it crashed after few hours with 
> following error: 
> # A fatal error has been detected by the Java Runtime Environment:
> #
> #  SIGSEGV (0xb) at pc=0x7f834389b332, pid=8997, tid=9025
> #
> # JRE version: OpenJDK Runtime Environment Microsoft-40354 (17.0.4.1+1) 
> (build 17.0.4.1+1-LTS)
> # Java VM: OpenJDK 64-Bit Server VM Microsoft-40354 (17.0.4.1+1-LTS, mixed 
> mode, sharing, tiered, compressed oops, compressed class ptrs, g1 gc, 
> linux-amd64)
> # Problematic frame:
> # V  [libjvm.so+0xad1332]  PhaseIdealLoop::build_loop_late_post_work(Node*, 
> bool)+0xf2
> #
> I've attached the full log with this email. Is it a known issue?
> 
> Thanks,
> Jen-Ya 


Re: solr 9 standalone crashed after few hours - PhaseIdealLoop::build_loop_late_post_work

2022-10-10 Thread Jen-Ya Ku
Thanks, Dave.
Looks like PhaseIdealLoop::build_loop_late_post_work are JIT runtime
hotspot compilation stuff?
We got this error right after upgrading solr 9 from solr 8.11.

Thanks,
Jen-Ya

On Mon, Oct 10, 2022 at 3:25 PM Dave  wrote:

> I won’t say for certain as I have never seen this but this seems like a
> garbage collection situation. Look there first to see if you can cancel
> that out as the cause
>
> > On Oct 10, 2022, at 5:59 PM, Jen-Ya Ku  wrote:
> >
> > 
> > Hi all,
> >
> > We've deployed solr9 on OpenJDK 17 and it crashed after few hours with
> following error:
> > # A fatal error has been detected by the Java Runtime Environment:
> > #
> > #  SIGSEGV (0xb) at pc=0x7f834389b332, pid=8997, tid=9025
> > #
> > # JRE version: OpenJDK Runtime Environment Microsoft-40354 (17.0.4.1+1)
> (build 17.0.4.1+1-LTS)
> > # Java VM: OpenJDK 64-Bit Server VM Microsoft-40354 (17.0.4.1+1-LTS,
> mixed mode, sharing, tiered, compressed oops, compressed class ptrs, g1 gc,
> linux-amd64)
> > # Problematic frame:
> > # V  [libjvm.so+0xad1332]
> PhaseIdealLoop::build_loop_late_post_work(Node*, bool)+0xf2
> > #
> > I've attached the full log with this email. Is it a known issue?
> >
> > Thanks,
> > Jen-Ya
>


Re: solr 9 standalone crashed after few hours - PhaseIdealLoop::build_loop_late_post_work

2022-10-10 Thread Shawn Heisey

On 10/10/22 15:58, Jen-Ya Ku wrote:
We've deployed solr9 on OpenJDK 17 and it crashed after few hours with 
following error:

# A fatal error has been detected by the Java Runtime Environment:
#
#  SIGSEGV (0xb) at pc=0x7f834389b332, pid=8997, tid=9025
#
# JRE version: OpenJDK Runtime Environment Microsoft-40354 
(17.0.4.1+1) (build 17.0.4.1+1-LTS)
# Java VM: OpenJDK 64-Bit Server VM Microsoft-40354 (17.0.4.1+1-LTS, 
mixed mode, sharing, tiered, compressed oops, compressed class ptrs, 
g1 gc, linux-amd64)

*# Problematic frame:
# V  [libjvm.so+0xad1332] 
 PhaseIdealLoop::build_loop_late_post_work(Node*, bool)+0xf2*

#
I've attached the full log with this email. Is it a known issue?


Your attachment did not come through.  It's generally a bad idea to rely 
on attachments, because the mailing list eats almost all of them.


https://komodor.com/learn/sigsegv-segmentation-faults-signal-11-exit-code-139/

Way back in the dustbin of history, if somebody got signal 11 while 
compiling the Linux kernel, the general advice was "This might be bad 
hardware.  It could be a software problem too."


The crash information looks to me like a problem with Java.  I would 
strongly recommend using Java 11 instead of Java 17 with current 
versions of Solr.  The latest 11.0.xx version you can get.  Some variety 
of OpenJDK is probably the way to go, unless you're willing to pay 
Oracle for the right to run it in production.  Everything seems to be 
very stable for me running with OpenJDK 11.0.16 on Ubuntu Server.  I did 
give OpenJDK 16 a try.  It was something older than the version you're 
running.


It looks like you may have a JDK that's packaged by Microsoft for 
Linux.  Not sure I would trust Microsoft to get that right.


Thanks,
Shawn



Re: solr 9 standalone crashed after few hours - PhaseIdealLoop::build_loop_late_post_work

2022-10-10 Thread dmitri maziuk

On 2022-10-10 4:58 PM, Jen-Ya Ku wrote:

Hi all,

We've deployed solr9 on OpenJDK 17 and it crashed after few hours with 
following error:

# A fatal error has been detected by the Java Runtime Environment:
#
#  SIGSEGV (0xb) at pc=0x7f834389b332, pid=8997, tid=9025


What are you running it on? -- E.g. some SuperMicro motherboards ship 
with fan speed set to "balanced" in the BIOS and have no thermal sensors 
under the RAM banks. Can you guess what happens when a memory-intensive 
job comes along and runs long enough for the SIMMS to heat up?


Dima



Re: Understanding LTR debug query output

2022-10-10 Thread Alessandro Benedetti
Hi,
from what I see you are using a Neural Network implementation as the model
(org.apache.solr.ltr.model.NeuralNetworkModel ?) and I agree is
definitely not the best in terms of explainability
(org.apache.solr.ltr.model.NeuralNetworkModel#explain).

Effectively it just summarizes the layers, the way the score is calculated
is using the weights in the layers and the activation function.
To be fair, even with a detailed formula, I suspect, as a human, you
wouldn't be getting much more anyway.

For the features, it should be easier to explain why they have that value,
you should take a look to the way you defined those in the features.json .
If hierScore is just a field value and doesn't match, possibly a bug? maybe
related with the numerical representation? What is the field type?

Cheers


--
*Alessandro Benedetti*
Director @ Sease Ltd.
*Apache Lucene/Solr Committer*
*Apache Solr PMC Member*

e-mail: a.benede...@sease.io


*Sease* - Information Retrieval Applied
Consulting | Training | Open Source

Website: Sease.io 
LinkedIn  | Twitter
 | Youtube
 | Github



On Fri, 7 Oct 2022 at 12:32, gnandre  wrote:

> Hi,
>
> I have implemented LTR (LambdaRank) functionality but there are some search
> cases where the relevancy is actually getting worse. I am trying to
> understand why some results are ranked over the others. Naturally, I am
> using a debug query to understand what is going on.
>
> e.g. here is the explain response for one of the document:
>
> doc:en:/help/coder/index.html":"\n0.93952394 =
>
> (name=model,featureValues=[linkScore=1.7102735,hierScore=3.9314165,originalScore=0.029598212,tfidf_title=-0.3270329,tfidf_body=-0.6185444,tfidf_url=-0.8011434,tfidf_file_name=-0.37964302,tfidf_primary_header_en=-0.32059863,tfidf_secondary_header_en=0.36570454,tfidf_meta_description_en=-0.09497543,tfidf_inlink_text_en=-0.08638504,tfidf_indexed_not_highlighted_en=-0.2544066],layers=[(matrix=75x12,activation=relu),(matrix=1x75,activation=sigmoid)])\n
>
> Can somebody tell me how the final score of 0.93952394 is getting
> calculated for this document? Also, how are the featureValues
> calculated? e.g. hierScore field value for this document is actually
> 0.5 but it shows up here as 3.9314165.
>