[jira] [Commented] (SOLR-16305) MODIFYCOLLECTION with 'property.*' changes can't change values used in config file variables (even though they can be set during collection CREATE)

2022-10-11 Thread Andrzej Bialecki (Jira)


[ 
https://issues.apache.org/jira/browse/SOLR-16305?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17615955#comment-17615955
 ] 

Andrzej Bialecki commented on SOLR-16305:
-

AFAIK the propagation of `property.*` values to cores is accidental, the 
original purpose (again, AFAIK) was to be able to set aux properties in the 
`state.json`, to keep additional per-collection state that could be used by 
other components. The advantage of this is that they would automatically appear 
in DocCollection at the API level (unlike COLLECTIONPROP API which is 
incomplete, because only the "write" part is supported but not "read", without 
going directly to ZK. AFAIK the COLLECTIONPROP was added because routed aliases 
needed some place to keep additional state, potentially too large / 
inconvenient to stick into state.json.)

However, even using these `property.*` values is half-broken, as I recently 
discovered - it's supported in MODIFYCOLLECTION but not in CREATE, due to 
`ClusterStateMutator.createCollection()` copying only the predefined properties 
and ignoring anything else.

This should be fixed in some way - I'm inclined to say in both ways ;) that is, 
the COLLECTIONPROP API should be completed so that it includes the reading 
part, and the CREATE should be fixed to accept `property.*`. And I don't see 
the purpose of propagating these collection-level props to individual cores, so 
this part could be removed until it's needed.

> MODIFYCOLLECTION with 'property.*' changes can't change values used in config 
> file variables (even though they can be set during collection CREATE)
> ---
>
> Key: SOLR-16305
> URL: https://issues.apache.org/jira/browse/SOLR-16305
> Project: Solr
>  Issue Type: Bug
>  Security Level: Public(Default Security Level. Issues are Public) 
>Reporter: Chris M. Hostetter
>Priority: Major
> Attachments: SOLR-16305_test.patch
>
>
> Consider a configset with a  {{solrconfig.xml}} that includes a snippet like 
> this...
> {code:java}
> ${custom.prop:customDefVal}
> {code}
> ...this {{custom.prop}} can be set when doing a {{CREATE}} command for a 
> collection that uses this configset, using the {{property.*}} prefix as noted 
> in the reg-guide...
> {quote}{{property.{_}name{_}={_}value{_}}}
> |Optional|Default: none|
> Set core property _name_ to {_}value{_}. See the section [Core 
> Discovery|https://solr.apache.org/guide/solr/latest/configuration-guide/core-discovery.html]
>  for details on supported properties and values.
> {quote}
> ...BUT
> These values can *not* be changed by using the {{MODIFYCOLLECTION}} command, 
> in spite of the ref-guide indicating that it can be used to modify custom 
> {{property.*}} attributes...
> {quote}The attributes that can be modified are:
>  * {{replicationFactor}}
>  * {{collection.configName}}
>  * {{readOnly}}
>  * other custom properties that use a {{property.}} prefix
> See the [CREATE 
> action|https://solr.apache.org/guide/solr/latest/deployment-guide/collection-management.html#create]
>  section above for details on these attributes.
> {quote}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@solr.apache.org
For additional commands, e-mail: issues-h...@solr.apache.org



[jira] [Commented] (SOLR-16305) MODIFYCOLLECTION with 'property.*' changes can't change values used in config file variables (even though they can be set during collection CREATE)

2022-10-13 Thread Andrzej Bialecki (Jira)


[ 
https://issues.apache.org/jira/browse/SOLR-16305?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17617242#comment-17617242
 ] 

Andrzej Bialecki commented on SOLR-16305:
-

{quote}I think you mean the exact opposite of what you just said?
{quote}
No, I meant that they cannot be set as DocCollection properties, they are 
silently skipped there (wihle they are indeed propagated to cores). If you want 
to set a DocCollection property you have to use MODIFYCOLLECTION, and while 
this works for setting `property.*` in DocCollection it indeed does not 
propagate these custom props to cores.

Whichever way you look at it it's a mess.

> MODIFYCOLLECTION with 'property.*' changes can't change values used in config 
> file variables (even though they can be set during collection CREATE)
> ---
>
> Key: SOLR-16305
> URL: https://issues.apache.org/jira/browse/SOLR-16305
> Project: Solr
>  Issue Type: Bug
>  Security Level: Public(Default Security Level. Issues are Public) 
>Reporter: Chris M. Hostetter
>Priority: Major
> Attachments: SOLR-16305_test.patch
>
>
> Consider a configset with a  {{solrconfig.xml}} that includes a snippet like 
> this...
> {code:java}
> ${custom.prop:customDefVal}
> {code}
> ...this {{custom.prop}} can be set when doing a {{CREATE}} command for a 
> collection that uses this configset, using the {{property.*}} prefix as noted 
> in the reg-guide...
> {quote}{{property.{_}name{_}={_}value{_}}}
> |Optional|Default: none|
> Set core property _name_ to {_}value{_}. See the section [Core 
> Discovery|https://solr.apache.org/guide/solr/latest/configuration-guide/core-discovery.html]
>  for details on supported properties and values.
> {quote}
> ...BUT
> These values can *not* be changed by using the {{MODIFYCOLLECTION}} command, 
> in spite of the ref-guide indicating that it can be used to modify custom 
> {{property.*}} attributes...
> {quote}The attributes that can be modified are:
>  * {{replicationFactor}}
>  * {{collection.configName}}
>  * {{readOnly}}
>  * other custom properties that use a {{property.}} prefix
> See the [CREATE 
> action|https://solr.apache.org/guide/solr/latest/deployment-guide/collection-management.html#create]
>  section above for details on these attributes.
> {quote}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@solr.apache.org
For additional commands, e-mail: issues-h...@solr.apache.org



[jira] [Commented] (SOLR-16305) MODIFYCOLLECTION with 'property.*' changes can't change values used in config file variables (even though they can be set during collection CREATE)

2022-10-17 Thread Andrzej Bialecki (Jira)


[ 
https://issues.apache.org/jira/browse/SOLR-16305?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17618812#comment-17618812
 ] 

Andrzej Bialecki commented on SOLR-16305:
-

{quote}Should this Jira  have a linked "converse" issue: "CREATE collection 
with property.* doesn't set values in DocCollection (even though 
MODIFYCOLLECTION can cahnge them)" ?
{quote}
I think so.
{quote}WTF the {{COLLECTIONPROP}} command's purpose / expected usage is?
{quote}
AFAIK they are currently used only for maintaining routed aliases. We could 
extend it to cover a use case of "I want to maintain arbitrary props per 
collection" but then we would have to add the reading API and document it. And 
probably do some other work too, because this API is isolated from the main 
DocCollection model.

(For me one reason for ab-using DocCollection to keep properties was that 
there's currently no connection between props that you can set with 
COLLECTIONPROP and the replica placement API model, which purposely uses API 
disconnected from Solr internals. So if I want to mark some collection as 
having this or other replica placement properties, the 
SolrCollection.getCustomProperty ONLY returns props set in DocCollection and 
not those set with COLLECTIONPROP. Of course, I can always keep these special 
props in a config file specific to the placement plugin ... but this 
complicates the lifecycle of these properties as you create / delete 
collections, so keeping them in DocCollection is convenient).

> MODIFYCOLLECTION with 'property.*' changes can't change values used in config 
> file variables (even though they can be set during collection CREATE)
> ---
>
> Key: SOLR-16305
> URL: https://issues.apache.org/jira/browse/SOLR-16305
> Project: Solr
>  Issue Type: Bug
>  Security Level: Public(Default Security Level. Issues are Public) 
>Reporter: Chris M. Hostetter
>Priority: Major
> Attachments: SOLR-16305_test.patch
>
>
> Consider a configset with a  {{solrconfig.xml}} that includes a snippet like 
> this...
> {code:java}
> ${custom.prop:customDefVal}
> {code}
> ...this {{custom.prop}} can be set when doing a {{CREATE}} command for a 
> collection that uses this configset, using the {{property.*}} prefix as noted 
> in the reg-guide...
> {quote}{{property.{_}name{_}={_}value{_}}}
> |Optional|Default: none|
> Set core property _name_ to {_}value{_}. See the section [Core 
> Discovery|https://solr.apache.org/guide/solr/latest/configuration-guide/core-discovery.html]
>  for details on supported properties and values.
> {quote}
> ...BUT
> These values can *not* be changed by using the {{MODIFYCOLLECTION}} command, 
> in spite of the ref-guide indicating that it can be used to modify custom 
> {{property.*}} attributes...
> {quote}The attributes that can be modified are:
>  * {{replicationFactor}}
>  * {{collection.configName}}
>  * {{readOnly}}
>  * other custom properties that use a {{property.}} prefix
> See the [CREATE 
> action|https://solr.apache.org/guide/solr/latest/deployment-guide/collection-management.html#create]
>  section above for details on these attributes.
> {quote}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@solr.apache.org
For additional commands, e-mail: issues-h...@solr.apache.org



[jira] [Commented] (SOLR-15616) Allow thread metrics to be cached

2023-01-17 Thread Andrzej Bialecki (Jira)


[ 
https://issues.apache.org/jira/browse/SOLR-15616?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17677899#comment-17677899
 ] 

Andrzej Bialecki commented on SOLR-15616:
-

LGTM, thanks for seeing this through, Ishan!

One minor suggestion: since the interval is expressed in seconds (whereas often 
other intervals are expressed in millis) maybe we should use 
`threadIntervalSec` or something like that? I leave it up to you - the docs say 
it's in seconds but if it's in the name then it's self-explanatory.

> Allow thread metrics to be cached
> -
>
> Key: SOLR-15616
> URL: https://issues.apache.org/jira/browse/SOLR-15616
> Project: Solr
>  Issue Type: Improvement
>  Components: metrics
>Reporter: Ishan Chattopadhyaya
>Assignee: Ishan Chattopadhyaya
>Priority: Major
> Attachments: SOLR-15616-2.patch, SOLR-15616-9x.patch, 
> SOLR-15616.patch, SOLR-15616.patch
>
>
> Computing JVM metrics for threads can be expensive, and we should provide 
> option to users to avoid doing so on every call to the metrics API 
> (group=jvm).



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@solr.apache.org
For additional commands, e-mail: issues-h...@solr.apache.org



[jira] [Created] (SOLR-16649) Http2SolrClient.processErrorsAndResponse uses wrong instance of ResponseParser

2023-02-07 Thread Andrzej Bialecki (Jira)
Andrzej Bialecki created SOLR-16649:
---

 Summary: Http2SolrClient.processErrorsAndResponse uses wrong 
instance of ResponseParser
 Key: SOLR-16649
 URL: https://issues.apache.org/jira/browse/SOLR-16649
 Project: Solr
  Issue Type: Bug
  Security Level: Public (Default Security Level. Issues are Public)
  Components: clients - java
Affects Versions: 9.1.1, main (10.0)
Reporter: Andrzej Bialecki


`Http2SolrClient:800` calls `wantStream(...)` method but passes the wrong 
argument to it - instead of passing the local `processor` arg it uses the 
instance field `parser`.

Throughout this class there's a repeated pattern that easily leads to this 
confusion - in many methods a local var `parser` is created that overshadows 
the instance field, and then this local `parser` is passed around as argument 
to various operations. However, in this method the argument passed from the 
caller is named differently  (`processor`) and thus does not overshadow the 
instance field, which leads to this mistake.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@solr.apache.org
For additional commands, e-mail: issues-h...@solr.apache.org



[jira] [Updated] (SOLR-16649) Http2SolrClient.processErrorsAndResponse uses wrong instance of ResponseParser

2023-02-07 Thread Andrzej Bialecki (Jira)


 [ 
https://issues.apache.org/jira/browse/SOLR-16649?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andrzej Bialecki updated SOLR-16649:

Description: 
{{Http2SolrClient:800}} calls {{wantStream(...)}} method but passes the wrong 
argument to it - instead of passing the local {{processor}} arg it uses the 
instance field {{parser}}.

Throughout this class there's a repeated pattern that easily leads to this 
confusion - in many methods a local var {{parser}} is created that overshadows 
the instance field, and then this local {{parser}} is passed around as argument 
to various operations. However, in this particular method the argument passed 
from the caller is named differently  ({{processor}}) and thus does not 
overshadow the instance field, which leads to this mistake.

  was:
`Http2SolrClient:800` calls `wantStream(...)` method but passes the wrong 
argument to it - instead of passing the local `processor` arg it uses the 
instance field `parser`.

Throughout this class there's a repeated pattern that easily leads to this 
confusion - in many methods a local var `parser` is created that overshadows 
the instance field, and then this local `parser` is passed around as argument 
to various operations. However, in this method the argument passed from the 
caller is named differently  (`processor`) and thus does not overshadow the 
instance field, which leads to this mistake.


> Http2SolrClient.processErrorsAndResponse uses wrong instance of ResponseParser
> --
>
> Key: SOLR-16649
> URL: https://issues.apache.org/jira/browse/SOLR-16649
> Project: Solr
>  Issue Type: Bug
>  Security Level: Public(Default Security Level. Issues are Public) 
>  Components: clients - java
>Affects Versions: main (10.0), 9.1.1
>Reporter: Andrzej Bialecki
>Priority: Major
>
> {{Http2SolrClient:800}} calls {{wantStream(...)}} method but passes the wrong 
> argument to it - instead of passing the local {{processor}} arg it uses the 
> instance field {{parser}}.
> Throughout this class there's a repeated pattern that easily leads to this 
> confusion - in many methods a local var {{parser}} is created that 
> overshadows the instance field, and then this local {{parser}} is passed 
> around as argument to various operations. However, in this particular method 
> the argument passed from the caller is named differently  ({{processor}}) and 
> thus does not overshadow the instance field, which leads to this mistake.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@solr.apache.org
For additional commands, e-mail: issues-h...@solr.apache.org



[jira] [Updated] (SOLR-16649) Http2SolrClient.processErrorsAndResponse uses wrong instance of ResponseParser

2023-02-07 Thread Andrzej Bialecki (Jira)


 [ 
https://issues.apache.org/jira/browse/SOLR-16649?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andrzej Bialecki updated SOLR-16649:

Attachment: SOLR-16649.patch

> Http2SolrClient.processErrorsAndResponse uses wrong instance of ResponseParser
> --
>
> Key: SOLR-16649
> URL: https://issues.apache.org/jira/browse/SOLR-16649
> Project: Solr
>  Issue Type: Bug
>  Security Level: Public(Default Security Level. Issues are Public) 
>  Components: clients - java
>Affects Versions: main (10.0), 9.1.1
>Reporter: Andrzej Bialecki
>Priority: Major
> Attachments: SOLR-16649.patch
>
>
> {{Http2SolrClient:800}} calls {{wantStream(...)}} method but passes the wrong 
> argument to it - instead of passing the local {{processor}} arg it uses the 
> instance field {{parser}}.
> Throughout this class there's a repeated pattern that easily leads to this 
> confusion - in many methods a local var {{parser}} is created that 
> overshadows the instance field, and then this local {{parser}} is passed 
> around as argument to various operations. However, in this particular method 
> the argument passed from the caller is named differently  ({{processor}}) and 
> thus does not overshadow the instance field, which leads to this mistake.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@solr.apache.org
For additional commands, e-mail: issues-h...@solr.apache.org



[jira] [Commented] (SOLR-16649) Http2SolrClient.processErrorsAndResponse uses wrong instance of ResponseParser

2023-02-07 Thread Andrzej Bialecki (Jira)


[ 
https://issues.apache.org/jira/browse/SOLR-16649?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17685416#comment-17685416
 ] 

Andrzej Bialecki commented on SOLR-16649:
-

Simple patch with a test case - it fails with stock code, succeeds with the fix.

> Http2SolrClient.processErrorsAndResponse uses wrong instance of ResponseParser
> --
>
> Key: SOLR-16649
> URL: https://issues.apache.org/jira/browse/SOLR-16649
> Project: Solr
>  Issue Type: Bug
>  Security Level: Public(Default Security Level. Issues are Public) 
>  Components: clients - java
>Affects Versions: main (10.0), 9.1.1
>Reporter: Andrzej Bialecki
>Priority: Major
> Attachments: SOLR-16649.patch
>
>
> {{Http2SolrClient:800}} calls {{wantStream(...)}} method but passes the wrong 
> argument to it - instead of passing the local {{processor}} arg it uses the 
> instance field {{parser}}.
> Throughout this class there's a repeated pattern that easily leads to this 
> confusion - in many methods a local var {{parser}} is created that 
> overshadows the instance field, and then this local {{parser}} is passed 
> around as argument to various operations. However, in this particular method 
> the argument passed from the caller is named differently  ({{processor}}) and 
> thus does not overshadow the instance field, which leads to this mistake.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@solr.apache.org
For additional commands, e-mail: issues-h...@solr.apache.org



[jira] [Updated] (SOLR-16649) Http2SolrClient.processErrorsAndResponse uses wrong instance of ResponseParser

2023-02-08 Thread Andrzej Bialecki (Jira)


 [ 
https://issues.apache.org/jira/browse/SOLR-16649?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andrzej Bialecki updated SOLR-16649:

Attachment: SOLR-16649-1.patch

> Http2SolrClient.processErrorsAndResponse uses wrong instance of ResponseParser
> --
>
> Key: SOLR-16649
> URL: https://issues.apache.org/jira/browse/SOLR-16649
> Project: Solr
>  Issue Type: Bug
>  Security Level: Public(Default Security Level. Issues are Public) 
>  Components: clients - java
>Affects Versions: main (10.0), 9.1.1
>Reporter: Andrzej Bialecki
>Priority: Major
> Attachments: SOLR-16649-1.patch, SOLR-16649.patch
>
>
> {{Http2SolrClient:800}} calls {{wantStream(...)}} method but passes the wrong 
> argument to it - instead of passing the local {{processor}} arg it uses the 
> instance field {{parser}}.
> Throughout this class there's a repeated pattern that easily leads to this 
> confusion - in many methods a local var {{parser}} is created that 
> overshadows the instance field, and then this local {{parser}} is passed 
> around as argument to various operations. However, in this particular method 
> the argument passed from the caller is named differently  ({{processor}}) and 
> thus does not overshadow the instance field, which leads to this mistake.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@solr.apache.org
For additional commands, e-mail: issues-h...@solr.apache.org



[jira] [Commented] (SOLR-16649) Http2SolrClient.processErrorsAndResponse uses wrong instance of ResponseParser

2023-02-08 Thread Andrzej Bialecki (Jira)


[ 
https://issues.apache.org/jira/browse/SOLR-16649?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17685747#comment-17685747
 ] 

Andrzej Bialecki commented on SOLR-16649:
-

Oops, right - I attached the new patch.

> Http2SolrClient.processErrorsAndResponse uses wrong instance of ResponseParser
> --
>
> Key: SOLR-16649
> URL: https://issues.apache.org/jira/browse/SOLR-16649
> Project: Solr
>  Issue Type: Bug
>  Security Level: Public(Default Security Level. Issues are Public) 
>  Components: clients - java
>Affects Versions: main (10.0), 9.1.1
>Reporter: Andrzej Bialecki
>Priority: Major
> Attachments: SOLR-16649-1.patch, SOLR-16649.patch
>
>
> {{Http2SolrClient:800}} calls {{wantStream(...)}} method but passes the wrong 
> argument to it - instead of passing the local {{processor}} arg it uses the 
> instance field {{parser}}.
> Throughout this class there's a repeated pattern that easily leads to this 
> confusion - in many methods a local var {{parser}} is created that 
> overshadows the instance field, and then this local {{parser}} is passed 
> around as argument to various operations. However, in this particular method 
> the argument passed from the caller is named differently  ({{processor}}) and 
> thus does not overshadow the instance field, which leads to this mistake.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@solr.apache.org
For additional commands, e-mail: issues-h...@solr.apache.org



[jira] [Commented] (SOLR-16507) Remove NodeStateProvider & Snitch

2023-03-21 Thread Andrzej Bialecki (Jira)


[ 
https://issues.apache.org/jira/browse/SOLR-16507?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17703331#comment-17703331
 ] 

Andrzej Bialecki commented on SOLR-16507:
-

Thanks [~dsmiley] for bringing this to my attention.
 
The reason we used the existing NodeStateProvider abstraction in the replica 
placement code was that retrieving per-node metrics is messy and quirky, all of 
which is hidden in SolrClientNodeStateProvider.
 
The internal structure (snitches and co) can and should be refactored and 
simplified because these concepts are not used anywhere else anymore, they are 
legacy abstractions from the time when they were used for collection rules DSL.
 
However, IMHO something like NodeStateProvider still has its place. No matter 
what you replace it with, the complexity of retrieving per-node attributes will 
still be present somewhere- and hiding it in a NodeStateProvider (or similar 
concept) as a high-level API at least gives us a possibility of reuse. If we 
were to put all this nasty code into AttributeFetcherImpl then we would pretty 
much limit its usefulness only to the placement code.
 
 
SolrCloudManager is perhaps no longer useful anymore and can be factored out, 
but IMHO something equivalent to NodeStateProvider is still needed.
 
Re. "snitchSession" - this is now used only in `ImplicitSnitch` for caching the 
node roles, in order to avoid loading this data from ZK for every node.

> Remove NodeStateProvider & Snitch
> -
>
> Key: SOLR-16507
> URL: https://issues.apache.org/jira/browse/SOLR-16507
> Project: Solr
>  Issue Type: Task
>Reporter: David Smiley
>Priority: Major
>  Labels: newdev
>
> The NodeStateProvider is a relic relating to the old autoscaling framework 
> that was removed in Solr 9.  The only remaining usage of it is for 
> SplitShardCmd to check the disk space.  For this, it could use the metrics 
> api.
> I think we'll observe that Snitch and other classes in 
> org.apache.solr.common.cloud.rule can be removed as well, as it's related to 
> NodeStateProvider.
> Only 
> org.apache.solr.cluster.placement.impl.AttributeFetcherImpl#getMetricSnitchTag
>  and org.apache.solr.cluster.placement.impl.NodeMetricImpl refer to some 
> constants in the code to be removed.  Those constants could move out, 
> consolidated somewhere we think is appropriate.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@solr.apache.org
For additional commands, e-mail: issues-h...@solr.apache.org



[jira] [Commented] (SOLR-16507) Remove NodeStateProvider & Snitch

2023-03-22 Thread Andrzej Bialecki (Jira)


[ 
https://issues.apache.org/jira/browse/SOLR-16507?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17703746#comment-17703746
 ] 

Andrzej Bialecki commented on SOLR-16507:
-

bq. Do you think there's a point to SplitShardCmd using NodeStateProvider vs 
just going to the metrics API?

You're making my point for me ;) you could do that, but the SplitShardCmd would 
become very complex with a lot of non-reusable code - because you would still 
have to bake-in making HTTP requests to other nodes and parsing / extracting 
metrics values. So it's better to hide this complexity in a high-level utility 
API.

IMHO NodeStateProvider is a good abstraction, just its implementation needs to 
be cleaned up.

> Remove NodeStateProvider & Snitch
> -
>
> Key: SOLR-16507
> URL: https://issues.apache.org/jira/browse/SOLR-16507
> Project: Solr
>  Issue Type: Task
>Reporter: David Smiley
>Priority: Major
>  Labels: newdev
>
> The NodeStateProvider is a relic relating to the old autoscaling framework 
> that was removed in Solr 9.  The only remaining usage of it is for 
> SplitShardCmd to check the disk space.  For this, it could use the metrics 
> api.
> I think we'll observe that Snitch and other classes in 
> org.apache.solr.common.cloud.rule can be removed as well, as it's related to 
> NodeStateProvider.
> Only 
> org.apache.solr.cluster.placement.impl.AttributeFetcherImpl#getMetricSnitchTag
>  and org.apache.solr.cluster.placement.impl.NodeMetricImpl refer to some 
> constants in the code to be removed.  Those constants could move out, 
> consolidated somewhere we think is appropriate.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@solr.apache.org
For additional commands, e-mail: issues-h...@solr.apache.org



[jira] [Created] (SOLR-17138) Support other QueryTimeout criteria

2024-01-30 Thread Andrzej Bialecki (Jira)
Andrzej Bialecki created SOLR-17138:
---

 Summary: Support other QueryTimeout criteria
 Key: SOLR-17138
 URL: https://issues.apache.org/jira/browse/SOLR-17138
 Project: Solr
  Issue Type: New Feature
  Security Level: Public (Default Security Level. Issues are Public)
  Components: Query Budget
Reporter: Andrzej Bialecki


Complex Solr queries can consume significant memory and CPU while being 
processed. When OOM or CPU saturation is reached Solr becomes unresponsive, 
which further compounds the problem. Often such “killer queries” are not 
written to logs, which makes them difficult to diagnose. This happens even with 
best practices in place.

It should be possible to set limits in Solr that cannot be exceeded by 
individual queries. This mechanism would monitor an accumulating “cost” of a 
query while it’s being executed and compare it to the configured maximum cost 
(budget), expressed in terms of CPU and/or memory usage that can be attributed 
to this query. Should these limits be exceeded the individual query execution 
should be terminated, without affecting other concurrently executing queries.

The CircuitBreakers functionality doesn't distinguish the source of the load 
and can't protect other query executions from a particular runaway query. We 
need a more fine-grained mechanism.

The existing `QueryTimeout` API enables such termination of individual queries. 
However, the existing implementation (`SolrQueryTimeoutImpl` used with 
`timeAllowed` query param) only uses elapsed wall-clock time as the termination 
criterion. This is insufficient - in case of resource contention the wall-clock 
time doesn’t represent correctly the actual CPU cost of executing a particular 
query. A query may produce results after a long time not because of its 
complexity or bad behavior but because of the general resource contention 
caused by other concurrently executing queries. OTOH a single runaway query may 
consume all resources and cause all other valid queries to fail if they exceed 
the wall-clock `timeAllowed`.

I propose adding two additional criteria for limiting the maximum "query 
budget":
 * per-thread CPU time: using `getThreadCpuTime` to periodically check 
(`QueryTimeout.shouldExit()`) the current CPU consumption since the start of 
the query execution.
 * per-thread memory allocation: using `getThreadAllocatedBytes`.

I ran some JMH microbenchmarks to ensure that these two methods are available 
on modern OS/JVM combinations and their cost is negligible (less than 0.5 
us/call). This means that the initial implementation may call these methods 
directly for every `shouldExist()` call without undue burden. If we decide that 
this still adds too much overhead we can change this to periodic updates in a 
background thread.

These two "query budget" constraints can be implemented as subclasses of 
`QueryTimeout`. Initially we can use a similar configuration mechanism as with 
`timeAllowed`, i.e. pass the max value as a query param, or add it to the 
search handler's invariants.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@solr.apache.org
For additional commands, e-mail: issues-h...@solr.apache.org



[jira] [Updated] (SOLR-17138) Support other QueryTimeout criteria

2024-01-30 Thread Andrzej Bialecki (Jira)


 [ 
https://issues.apache.org/jira/browse/SOLR-17138?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andrzej Bialecki updated SOLR-17138:

Description: 
Complex Solr queries can consume significant memory and CPU while being 
processed. When OOM or CPU saturation is reached Solr becomes unresponsive, 
which further compounds the problem. Often such “killer queries” are not 
written to logs, which makes them difficult to diagnose. This happens even with 
best practices in place.

It should be possible to set limits in Solr that cannot be exceeded by 
individual queries. This mechanism would monitor an accumulating “cost” of a 
query while it’s being executed and compare it to the configured maximum cost 
(budget), expressed in terms of CPU and/or memory usage that can be attributed 
to this query. Should these limits be exceeded the individual query execution 
should be terminated, without affecting other concurrently executing queries.

The CircuitBreakers functionality doesn't distinguish the source of the load 
and can't protect other query executions from a particular runaway query. We 
need a more fine-grained mechanism.

The existing `QueryTimeout` API enables such termination of individual queries. 
However, the existing implementation (`SolrQueryTimeoutImpl` used with 
`timeAllowed` query param) only uses elapsed wall-clock time as the termination 
criterion. This is insufficient - in case of resource contention the wall-clock 
time doesn’t represent correctly the actual CPU cost of executing a particular 
query. A query may produce results after a long time not because of its 
complexity or bad behavior but because of the general resource contention 
caused by other concurrently executing queries. OTOH a single runaway query may 
consume all resources and cause all other valid queries to fail if they exceed 
the wall-clock `timeAllowed`.

I propose adding two additional criteria for limiting the maximum "query 
budget":
 * per-thread CPU time: using `getThreadCpuTime` to periodically check 
(`QueryTimeout.shouldExit()`) the current CPU consumption since the start of 
the query execution.
 * per-thread memory allocation: using `getThreadAllocatedBytes`.

I ran some JMH microbenchmarks to ensure that these two methods are available 
on modern OS/JVM combinations and their cost is negligible (less than 0.5 
us/call). This means that the initial implementation may call these methods 
directly for every `shouldExit()` call without undue burden. If we decide that 
this still adds too much overhead we can change this to periodic updates in a 
background thread.

These two "query budget" constraints can be implemented as subclasses of 
`QueryTimeout`. Initially we can use a similar configuration mechanism as with 
`timeAllowed`, i.e. pass the max value as a query param, or add it to the 
search handler's invariants.

  was:
Complex Solr queries can consume significant memory and CPU while being 
processed. When OOM or CPU saturation is reached Solr becomes unresponsive, 
which further compounds the problem. Often such “killer queries” are not 
written to logs, which makes them difficult to diagnose. This happens even with 
best practices in place.

It should be possible to set limits in Solr that cannot be exceeded by 
individual queries. This mechanism would monitor an accumulating “cost” of a 
query while it’s being executed and compare it to the configured maximum cost 
(budget), expressed in terms of CPU and/or memory usage that can be attributed 
to this query. Should these limits be exceeded the individual query execution 
should be terminated, without affecting other concurrently executing queries.

The CircuitBreakers functionality doesn't distinguish the source of the load 
and can't protect other query executions from a particular runaway query. We 
need a more fine-grained mechanism.

The existing `QueryTimeout` API enables such termination of individual queries. 
However, the existing implementation (`SolrQueryTimeoutImpl` used with 
`timeAllowed` query param) only uses elapsed wall-clock time as the termination 
criterion. This is insufficient - in case of resource contention the wall-clock 
time doesn’t represent correctly the actual CPU cost of executing a particular 
query. A query may produce results after a long time not because of its 
complexity or bad behavior but because of the general resource contention 
caused by other concurrently executing queries. OTOH a single runaway query may 
consume all resources and cause all other valid queries to fail if they exceed 
the wall-clock `timeAllowed`.

I propose adding two additional criteria for limiting the maximum "query 
budget":
 * per-thread CPU time: using `getThreadCpuTime` to periodically check 
(`QueryTimeout.shouldExit()`) the current CPU consumption since the start of 
the query execution.
 * per-thread memory allocation: using `getThreadAllocatedBytes`.

I ran so

[jira] [Created] (SOLR-17140) Refactor SolrQueryTimeoutImpl to support other implementations

2024-01-30 Thread Andrzej Bialecki (Jira)
Andrzej Bialecki created SOLR-17140:
---

 Summary: Refactor SolrQueryTimeoutImpl to support other 
implementations
 Key: SOLR-17140
 URL: https://issues.apache.org/jira/browse/SOLR-17140
 Project: Solr
  Issue Type: Sub-task
  Security Level: Public (Default Security Level. Issues are Public)
Reporter: Andrzej Bialecki






--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@solr.apache.org
For additional commands, e-mail: issues-h...@solr.apache.org



[jira] [Created] (SOLR-17141) Create CpuQueryTimeout implementation

2024-01-30 Thread Andrzej Bialecki (Jira)
Andrzej Bialecki created SOLR-17141:
---

 Summary: Create CpuQueryTimeout implementation
 Key: SOLR-17141
 URL: https://issues.apache.org/jira/browse/SOLR-17141
 Project: Solr
  Issue Type: Sub-task
  Security Level: Public (Default Security Level. Issues are Public)
Reporter: Andrzej Bialecki


This class will use `getThreadCpuTime` to determine when to signal `shouldExit`.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@solr.apache.org
For additional commands, e-mail: issues-h...@solr.apache.org



[jira] [Assigned] (SOLR-17141) Create CpuQueryTimeout implementation

2024-01-30 Thread Andrzej Bialecki (Jira)


 [ 
https://issues.apache.org/jira/browse/SOLR-17141?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andrzej Bialecki reassigned SOLR-17141:
---

Assignee: Andrzej Bialecki

> Create CpuQueryTimeout implementation
> -
>
> Key: SOLR-17141
> URL: https://issues.apache.org/jira/browse/SOLR-17141
> Project: Solr
>  Issue Type: Sub-task
>  Security Level: Public(Default Security Level. Issues are Public) 
>Reporter: Andrzej Bialecki
>Assignee: Andrzej Bialecki
>Priority: Major
>
> This class will use `getThreadCpuTime` to determine when to signal 
> `shouldExit`.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@solr.apache.org
For additional commands, e-mail: issues-h...@solr.apache.org



[jira] [Created] (SOLR-17150) Create MemQueryLimit implementation

2024-02-05 Thread Andrzej Bialecki (Jira)
Andrzej Bialecki created SOLR-17150:
---

 Summary: Create MemQueryLimit implementation
 Key: SOLR-17150
 URL: https://issues.apache.org/jira/browse/SOLR-17150
 Project: Solr
  Issue Type: Sub-task
  Security Level: Public (Default Security Level. Issues are Public)
  Components: Query Budget
Reporter: Andrzej Bialecki
Assignee: Andrzej Bialecki


An implementation of {{QueryTimeout}} that terminates misbehaving queries that 
allocate too much memory for their execution.

This is a bit more complicated than {{CpuQueryLimits}} because the first time a 
query is submitted it may legitimately allocate many sizeable objects (caches, 
field values, etc). So we want to catch and terminate queries that either 
exceed any reasonable threshold (eg. 2GB), or significantly exceed a 
time-weighted percentile of the recent queries.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@solr.apache.org
For additional commands, e-mail: issues-h...@solr.apache.org



[jira] [Updated] (SOLR-17141) Create CpuQueryLimit implementation

2024-02-05 Thread Andrzej Bialecki (Jira)


 [ 
https://issues.apache.org/jira/browse/SOLR-17141?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andrzej Bialecki updated SOLR-17141:

Summary: Create CpuQueryLimit implementation  (was: Create CpuQueryTimeout 
implementation)

> Create CpuQueryLimit implementation
> ---
>
> Key: SOLR-17141
> URL: https://issues.apache.org/jira/browse/SOLR-17141
> Project: Solr
>  Issue Type: Sub-task
>  Security Level: Public(Default Security Level. Issues are Public) 
>Reporter: Andrzej Bialecki
>Assignee: Andrzej Bialecki
>Priority: Major
>
> This class will use `getThreadCpuTime` to determine when to signal 
> `shouldExit`.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@solr.apache.org
For additional commands, e-mail: issues-h...@solr.apache.org



[jira] [Updated] (SOLR-17138) Support other QueryTimeout criteria

2024-02-05 Thread Andrzej Bialecki (Jira)


 [ 
https://issues.apache.org/jira/browse/SOLR-17138?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andrzej Bialecki updated SOLR-17138:

Description: 
Complex Solr queries can consume significant memory and CPU while being 
processed. When OOM or CPU saturation is reached Solr becomes unresponsive, 
which further compounds the problem. Often such “killer queries” are not 
written to logs, which makes them difficult to diagnose. This happens even with 
best practices in place.

It should be possible to set limits in Solr that cannot be exceeded by 
individual queries. This mechanism would monitor an accumulating “cost” of a 
query while it’s being executed and compare it to the configured maximum cost 
(budget), expressed in terms of CPU and/or memory usage that can be attributed 
to this query. Should these limits be exceeded the individual query execution 
should be terminated, without affecting other concurrently executing queries.

The CircuitBreakers functionality doesn't distinguish the source of the load 
and can't protect other query executions from a particular runaway query. We 
need a more fine-grained mechanism.

The existing {{QueryTimeout}} API enables such termination of individual 
queries. However, the existing implementation ({{SolrQueryTimeoutImpl}} used 
with {{timeAllowed}} query param) only uses elapsed wall-clock time as the 
termination criterion. This is insufficient - in case of resource contention 
the wall-clock time doesn’t represent correctly the actual CPU cost of 
executing a particular query. A query may produce results after a long time not 
because of its complexity or bad behavior but because of the general resource 
contention caused by other concurrently executing queries. OTOH a single 
runaway query may consume all resources and cause all other valid queries to 
fail if they exceed the wall-clock {{timeAllowed}}.

I propose adding two additional criteria for limiting the maximum "query 
budget":
 * per-thread CPU time: using {{getThreadCpuTime}} to periodically check 
({{QueryTimeout.shouldExit()}}) the current CPU consumption since the start of 
the query execution.
 * per-thread memory allocation: using {{getThreadAllocatedBytes}}.

I ran some JMH microbenchmarks to ensure that these two methods are available 
on modern OS/JVM combinations and their cost is negligible (less than 0.5 
us/call). This means that the initial implementation may call these methods 
directly for every {{shouldExit()}} call without undue burden. If we decide 
that this still adds too much overhead we can change this to periodic updates 
in a background thread.

These two "query budget" constraints can be implemented as subclasses of 
{{QueryTimeout}}. Initially we can use a similar configuration mechanism as 
with {{timeAllowed}}, i.e. pass the max value as a query param, or add it to 
the search handler's invariants.

  was:
Complex Solr queries can consume significant memory and CPU while being 
processed. When OOM or CPU saturation is reached Solr becomes unresponsive, 
which further compounds the problem. Often such “killer queries” are not 
written to logs, which makes them difficult to diagnose. This happens even with 
best practices in place.

It should be possible to set limits in Solr that cannot be exceeded by 
individual queries. This mechanism would monitor an accumulating “cost” of a 
query while it’s being executed and compare it to the configured maximum cost 
(budget), expressed in terms of CPU and/or memory usage that can be attributed 
to this query. Should these limits be exceeded the individual query execution 
should be terminated, without affecting other concurrently executing queries.

The CircuitBreakers functionality doesn't distinguish the source of the load 
and can't protect other query executions from a particular runaway query. We 
need a more fine-grained mechanism.

The existing `QueryTimeout` API enables such termination of individual queries. 
However, the existing implementation (`SolrQueryTimeoutImpl` used with 
`timeAllowed` query param) only uses elapsed wall-clock time as the termination 
criterion. This is insufficient - in case of resource contention the wall-clock 
time doesn’t represent correctly the actual CPU cost of executing a particular 
query. A query may produce results after a long time not because of its 
complexity or bad behavior but because of the general resource contention 
caused by other concurrently executing queries. OTOH a single runaway query may 
consume all resources and cause all other valid queries to fail if they exceed 
the wall-clock `timeAllowed`.

I propose adding two additional criteria for limiting the maximum "query 
budget":
 * per-thread CPU time: using `getThreadCpuTime` to periodically check 
(`QueryTimeout.shouldExit()`) the current CPU consumption since the start of 
the query execution.
 * per-thread memory allocation: using `getThreadAlloca

[jira] [Created] (SOLR-17151) Review current usage of QueryLimits to ensure complete coverage

2024-02-05 Thread Andrzej Bialecki (Jira)
Andrzej Bialecki created SOLR-17151:
---

 Summary: Review current usage of QueryLimits to ensure complete 
coverage
 Key: SOLR-17151
 URL: https://issues.apache.org/jira/browse/SOLR-17151
 Project: Solr
  Issue Type: Sub-task
  Security Level: Public (Default Security Level. Issues are Public)
  Components: Query Budget
Reporter: Andrzej Bialecki


Resource usage by a query is not limited to the actual search within 
{{QueryComponent}}. Other components invoked by {{SearchHandler}} may 
significantly contribute to this usage, either before or after the 
{{QueryComponent}}.

Those components that already use {{QueryTimeout}} either directly or 
indirectly will properly observe the limits and terminate if needed. However, 
other components may be expensive or misbehaving but fail to observe the limits 
imposed on the end-to-end query processing.

One such obvious place where we could add this check is where the 
{{SearchHandler}} loops over {{SearchComponent}-s - it should call explicitly 
{{QueryLimits.shouldExit()}} to ensure that even if previously executed 
component ignored the limits they will be still enforced at the 
{{SearchHandler}} level. There may be other places like this, too.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@solr.apache.org
For additional commands, e-mail: issues-h...@solr.apache.org



[jira] [Commented] (SOLR-17150) Create MemQueryLimit implementation

2024-02-08 Thread Andrzej Bialecki (Jira)


[ 
https://issues.apache.org/jira/browse/SOLR-17150?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17815673#comment-17815673
 ] 

Andrzej Bialecki commented on SOLR-17150:
-

Here's the proposed approach to implement two thresholds:
 * an absolute max limit to terminate any query that exceeds this allocation
 * a relative dynamic limit to terminate queries that exceed "typical" 
allocation

For the absolute limit: as with other implementations, {{memAllowed}} would set 
the absolute limit per query (float value in megabytes?). In order to 
accommodate initial queries this should be set to a relatively high value, 
which isn't optimal later for typical queries - this higher limit will 
eventually catch runaway queries but not before they consume significant memory.

For the dynamic limit: a histogram would be added to the metrics to track the 
recent memory usage per query (using exponentially decaying reservoir). The 
life-cycle of the histogram could be tied either to SolrCore or to 
SolrIndexSearcher (the latter seems more appropriate because of the warmup 
queries that would skew the longer-term stats in SolrCore's life-cycle).

After collecting sufficient number of data points (eg. {{{}N = 100{}}}) the 
component could start enforcing a dynamic limit based on a formula that takes 
into account the "typical" recent queries. For example: {{{}dynamicThreshold = 
X * p99{}}}, where {{X = 2.0}} by default.

Open issues:
 * does the dynamic threshold make sense? does the formula make sense?
 * I think that both the static and dynamic limits should be optional, ie. some 
combination of query params should allow user to skip the enforcement of either 
/ both. 
 * since the dynamic limit involves parameters (at least N and X above) that 
determine long-term tracking it can no longer be expressed just as short-lived 
query params, it needs a configuration with a life-cycle of SolrCore or longer. 
Where should we put this configuration?

> Create MemQueryLimit implementation
> ---
>
> Key: SOLR-17150
> URL: https://issues.apache.org/jira/browse/SOLR-17150
> Project: Solr
>  Issue Type: Sub-task
>  Security Level: Public(Default Security Level. Issues are Public) 
>  Components: Query Limits
>Reporter: Andrzej Bialecki
>Assignee: Andrzej Bialecki
>Priority: Major
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> An implementation of {{QueryTimeout}} that terminates misbehaving queries 
> that allocate too much memory for their execution.
> This is a bit more complicated than {{CpuQueryLimits}} because the first time 
> a query is submitted it may legitimately allocate many sizeable objects 
> (caches, field values, etc). So we want to catch and terminate queries that 
> either exceed any reasonable threshold (eg. 2GB), or significantly exceed a 
> time-weighted percentile of the recent queries.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@solr.apache.org
For additional commands, e-mail: issues-h...@solr.apache.org



[jira] [Created] (SOLR-17158) Terminate distributed processing quickly when query limit is reached

2024-02-09 Thread Andrzej Bialecki (Jira)
Andrzej Bialecki created SOLR-17158:
---

 Summary: Terminate distributed processing quickly when query limit 
is reached
 Key: SOLR-17158
 URL: https://issues.apache.org/jira/browse/SOLR-17158
 Project: Solr
  Issue Type: Sub-task
  Security Level: Public (Default Security Level. Issues are Public)
  Components: Query Limits
Reporter: Andrzej Bialecki


Solr should make sure that when query limits are reached and partial results 
are not needed (and not wanted) then both the processing in shards and in the 
query coordinator should be terminated as quickly as possible, and Solr should 
minimize wasted resources spent on eg. returning data from the remaining 
shards, merging responses in the coordinator, or returning any data back to the 
user.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@solr.apache.org
For additional commands, e-mail: issues-h...@solr.apache.org



[jira] [Commented] (SOLR-17138) Support other QueryTimeout criteria

2024-02-09 Thread Andrzej Bialecki (Jira)


[ 
https://issues.apache.org/jira/browse/SOLR-17138?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17816106#comment-17816106
 ] 

Andrzej Bialecki commented on SOLR-17138:
-

Here are results from a set of simple JMH benchmarks where the code would call 
the respective method for 100 threads. Results are in nanoseconds / call / 
thread. Both methods are supported and enabled by default on all tested JVMs.
 * Results on Macbook M1 Max, MacOS Sonoma.

||*Java version*||*getThreadAllocatedBytes*||*getCpuThreadTime*||
|Azul Zulu 11|95|757|
|OpenJDK 17|72|730|
|OpenJDK 21|83|819|

 
 *  Results on a Linux VM (on a Kubernetes cluster) running Ubuntu 22.04.

||*Java version*||*getThreadAllocatedBytes*||*getCpuThreadTime*||
|OpenJDK 11|40|238|
|OpenJDK 17|36|239|
|OpenJDK 21|41|236|
 * Results on a Windows VM (on a Kubernetes cluster) running Windows Server 
Core 10.

||*Java version*||*getThreadAllocatedBytes*||*getCpuThreadTime*||
|OpenJDK 11|108|440|
|Oracle Java 17|103|426|
|Oracle Java 21|105|447|

> Support other QueryTimeout criteria
> ---
>
> Key: SOLR-17138
> URL: https://issues.apache.org/jira/browse/SOLR-17138
> Project: Solr
>  Issue Type: New Feature
>  Security Level: Public(Default Security Level. Issues are Public) 
>  Components: Query Limits
>Reporter: Andrzej Bialecki
>Priority: Major
>
> Complex Solr queries can consume significant memory and CPU while being 
> processed. When OOM or CPU saturation is reached Solr becomes unresponsive, 
> which further compounds the problem. Often such “killer queries” are not 
> written to logs, which makes them difficult to diagnose. This happens even 
> with best practices in place.
> It should be possible to set limits in Solr that cannot be exceeded by 
> individual queries. This mechanism would monitor an accumulating “cost” of a 
> query while it’s being executed and compare it to the configured maximum cost 
> (budget), expressed in terms of CPU and/or memory usage that can be 
> attributed to this query. Should these limits be exceeded the individual 
> query execution should be terminated, without affecting other concurrently 
> executing queries.
> The CircuitBreakers functionality doesn't distinguish the source of the load 
> and can't protect other query executions from a particular runaway query. We 
> need a more fine-grained mechanism.
> The existing {{QueryTimeout}} API enables such termination of individual 
> queries. However, the existing implementation ({{SolrQueryTimeoutImpl}} used 
> with {{timeAllowed}} query param) only uses elapsed wall-clock time as the 
> termination criterion. This is insufficient - in case of resource contention 
> the wall-clock time doesn’t represent correctly the actual CPU cost of 
> executing a particular query. A query may produce results after a long time 
> not because of its complexity or bad behavior but because of the general 
> resource contention caused by other concurrently executing queries. OTOH a 
> single runaway query may consume all resources and cause all other valid 
> queries to fail if they exceed the wall-clock {{timeAllowed}}.
> I propose adding two additional criteria for limiting the maximum "query 
> budget":
>  * per-thread CPU time: using {{getThreadCpuTime}} to periodically check 
> ({{QueryTimeout.shouldExit()}}) the current CPU consumption since the start 
> of the query execution.
>  * per-thread memory allocation: using {{getThreadAllocatedBytes}}.
> I ran some JMH microbenchmarks to ensure that these two methods are available 
> on modern OS/JVM combinations and their cost is negligible (less than 0.5 
> us/call). This means that the initial implementation may call these methods 
> directly for every {{shouldExit()}} call without undue burden. If we decide 
> that this still adds too much overhead we can change this to periodic updates 
> in a background thread.
> These two "query budget" constraints can be implemented as subclasses of 
> {{QueryTimeout}}. Initially we can use a similar configuration mechanism as 
> with {{timeAllowed}}, i.e. pass the max value as a query param, or add it to 
> the search handler's invariants.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@solr.apache.org
For additional commands, e-mail: issues-h...@solr.apache.org



[jira] [Assigned] (SOLR-17158) Terminate distributed processing quickly when query limit is reached

2024-02-09 Thread Andrzej Bialecki (Jira)


 [ 
https://issues.apache.org/jira/browse/SOLR-17158?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andrzej Bialecki reassigned SOLR-17158:
---

Assignee: Andrzej Bialecki

> Terminate distributed processing quickly when query limit is reached
> 
>
> Key: SOLR-17158
> URL: https://issues.apache.org/jira/browse/SOLR-17158
> Project: Solr
>  Issue Type: Sub-task
>  Components: Query Limits
>Reporter: Andrzej Bialecki
>Assignee: Andrzej Bialecki
>Priority: Major
>
> Solr should make sure that when query limits are reached and partial results 
> are not needed (and not wanted) then both the processing in shards and in the 
> query coordinator should be terminated as quickly as possible, and Solr 
> should minimize wasted resources spent on eg. returning data from the 
> remaining shards, merging responses in the coordinator, or returning any data 
> back to the user.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@solr.apache.org
For additional commands, e-mail: issues-h...@solr.apache.org



[jira] [Commented] (SOLR-16986) Measure and aggregate thread CPU time in distributed search

2024-02-11 Thread Andrzej Bialecki (Jira)


[ 
https://issues.apache.org/jira/browse/SOLR-16986?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17816424#comment-17816424
 ] 

Andrzej Bialecki commented on SOLR-16986:
-

Ok. In any case, this has a bug in that it ignores all but the first time 
measure when there are nested requests. [~gus] and I will look into reusing 
{{ThreadStats}} if possible and fixing this in SOLR-17140.

> Measure and aggregate thread CPU time in distributed search
> ---
>
> Key: SOLR-16986
> URL: https://issues.apache.org/jira/browse/SOLR-16986
> Project: Solr
>  Issue Type: New Feature
>Reporter: David Smiley
>Priority: Major
>  Time Spent: 2h 40m
>  Remaining Estimate: 0h
>
> Solr responses include "QTime", which in retrospect might have been better 
> named "elapsedTime".  We propose adding here a "cpuTime" to return the amount 
> of time consumed by 
> ManagementFactory.getThreadMXBean().[getThreadCpuTime|https://docs.oracle.com/en/java/javase/11/docs/api/java.management/java/lang/management/ThreadMXBean.html]().
>   Unlike QTime, this will need to be aggregated across distributed requests.  
> This work item will only do the aggregation work for distributed search, 
> although it could be extended for other scenarios in future work items.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@solr.apache.org
For additional commands, e-mail: issues-h...@solr.apache.org



[jira] [Comment Edited] (SOLR-16986) Measure and aggregate thread CPU time in distributed search

2024-02-11 Thread Andrzej Bialecki (Jira)


[ 
https://issues.apache.org/jira/browse/SOLR-16986?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17816424#comment-17816424
 ] 

Andrzej Bialecki edited comment on SOLR-16986 at 2/11/24 2:16 PM:
--

Ok. In any case, this has a bug in that it ignores all but the first time 
measure when there are nested requests. [~gus] and I will look into reusing 
{{ThreadStats}} if possible and fixing this in SOLR-17140 so that the CPU time 
logged and the CPU time limit enforced by {{CpuQueryTimeLimit}} are consistent.


was (Author: ab):
Ok. In any case, this has a bug in that it ignores all but the first time 
measure when there are nested requests. [~gus] and I will look into reusing 
{{ThreadStats}} if possible and fixing this in SOLR-17140.

> Measure and aggregate thread CPU time in distributed search
> ---
>
> Key: SOLR-16986
> URL: https://issues.apache.org/jira/browse/SOLR-16986
> Project: Solr
>  Issue Type: New Feature
>Reporter: David Smiley
>Priority: Major
>  Time Spent: 2h 40m
>  Remaining Estimate: 0h
>
> Solr responses include "QTime", which in retrospect might have been better 
> named "elapsedTime".  We propose adding here a "cpuTime" to return the amount 
> of time consumed by 
> ManagementFactory.getThreadMXBean().[getThreadCpuTime|https://docs.oracle.com/en/java/javase/11/docs/api/java.management/java/lang/management/ThreadMXBean.html]().
>   Unlike QTime, this will need to be aggregated across distributed requests.  
> This work item will only do the aggregation work for distributed search, 
> although it could be extended for other scenarios in future work items.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@solr.apache.org
For additional commands, e-mail: issues-h...@solr.apache.org



[jira] [Commented] (SOLR-17141) Create CpuQueryLimit implementation

2024-02-12 Thread Andrzej Bialecki (Jira)


[ 
https://issues.apache.org/jira/browse/SOLR-17141?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17816644#comment-17816644
 ] 

Andrzej Bialecki commented on SOLR-17141:
-

[~gus] and I discussed this issue - the way {{ThreadStats}} is used in 
SOLR-16986 gives incomplete results because it ignores nested queries (which 
use the stack in {{{}SolrRequestInfo{}}}. We would like to fix this as part of 
the SOLR-17138 refactoring, and to avoid potential confusion when logged CPU 
time is different than the CPU time limit set here. This can be done when both 
the {{CpuQueryTimeLimit}} and {{ThreadStats}} use the same starting point but 
keep track of nested requests.

> Create CpuQueryLimit implementation
> ---
>
> Key: SOLR-17141
> URL: https://issues.apache.org/jira/browse/SOLR-17141
> Project: Solr
>  Issue Type: Sub-task
>Reporter: Andrzej Bialecki
>Assignee: Andrzej Bialecki
>Priority: Major
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> This class will use `getThreadCpuTime` to determine when to signal 
> `shouldExit`.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@solr.apache.org
For additional commands, e-mail: issues-h...@solr.apache.org



[jira] [Comment Edited] (SOLR-17141) Create CpuQueryLimit implementation

2024-02-12 Thread Andrzej Bialecki (Jira)


[ 
https://issues.apache.org/jira/browse/SOLR-17141?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17816644#comment-17816644
 ] 

Andrzej Bialecki edited comment on SOLR-17141 at 2/12/24 3:29 PM:
--

[~gus] and I discussed this issue - the way {{ThreadStats}} is used in 
SOLR-16986 gives incomplete results because it ignores nested queries (which 
use the stack in {{SolrRequestInfo}}). We would like to fix this as part of the 
SOLR-17138 refactoring, and to avoid potential confusion when logged CPU time 
is different than the CPU time limit set here. This can be done when both the 
{{CpuQueryTimeLimit}} and {{ThreadStats}} use the same starting point but keep 
track of nested requests.


was (Author: ab):
[~gus] and I discussed this issue - the way {{ThreadStats}} is used in 
SOLR-16986 gives incomplete results because it ignores nested queries (which 
use the stack in {{{}SolrRequestInfo{}}}. We would like to fix this as part of 
the SOLR-17138 refactoring, and to avoid potential confusion when logged CPU 
time is different than the CPU time limit set here. This can be done when both 
the {{CpuQueryTimeLimit}} and {{ThreadStats}} use the same starting point but 
keep track of nested requests.

> Create CpuQueryLimit implementation
> ---
>
> Key: SOLR-17141
> URL: https://issues.apache.org/jira/browse/SOLR-17141
> Project: Solr
>  Issue Type: Sub-task
>Reporter: Andrzej Bialecki
>Assignee: Andrzej Bialecki
>Priority: Major
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> This class will use `getThreadCpuTime` to determine when to signal 
> `shouldExit`.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@solr.apache.org
For additional commands, e-mail: issues-h...@solr.apache.org



[jira] [Updated] (SOLR-17141) Create CpuAllowedLimit implementation

2024-02-19 Thread Andrzej Bialecki (Jira)


 [ 
https://issues.apache.org/jira/browse/SOLR-17141?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andrzej Bialecki updated SOLR-17141:

Summary: Create CpuAllowedLimit implementation  (was: Create CpuQueryLimit 
implementation)

> Create CpuAllowedLimit implementation
> -
>
> Key: SOLR-17141
> URL: https://issues.apache.org/jira/browse/SOLR-17141
> Project: Solr
>  Issue Type: Sub-task
>Reporter: Andrzej Bialecki
>Assignee: Andrzej Bialecki
>Priority: Major
>  Time Spent: 3h 40m
>  Remaining Estimate: 0h
>
> This class will use `getThreadCpuTime` to determine when to signal 
> `shouldExit`.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@solr.apache.org
For additional commands, e-mail: issues-h...@solr.apache.org



[jira] [Updated] (SOLR-17141) Create CpuAllowedLimit implementation

2024-02-19 Thread Andrzej Bialecki (Jira)


 [ 
https://issues.apache.org/jira/browse/SOLR-17141?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andrzej Bialecki updated SOLR-17141:

Fix Version/s: 9.6.0

> Create CpuAllowedLimit implementation
> -
>
> Key: SOLR-17141
> URL: https://issues.apache.org/jira/browse/SOLR-17141
> Project: Solr
>  Issue Type: Sub-task
>Reporter: Andrzej Bialecki
>Assignee: Andrzej Bialecki
>Priority: Major
> Fix For: 9.6.0
>
>  Time Spent: 3h 50m
>  Remaining Estimate: 0h
>
> This class will use `getThreadCpuTime` to determine when to signal 
> `shouldExit`.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@solr.apache.org
For additional commands, e-mail: issues-h...@solr.apache.org



[jira] [Resolved] (SOLR-17141) Create CpuAllowedLimit implementation

2024-02-20 Thread Andrzej Bialecki (Jira)


 [ 
https://issues.apache.org/jira/browse/SOLR-17141?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andrzej Bialecki resolved SOLR-17141.
-
Resolution: Fixed

> Create CpuAllowedLimit implementation
> -
>
> Key: SOLR-17141
> URL: https://issues.apache.org/jira/browse/SOLR-17141
> Project: Solr
>  Issue Type: Sub-task
>Reporter: Andrzej Bialecki
>Assignee: Andrzej Bialecki
>Priority: Major
> Fix For: 9.6.0
>
>  Time Spent: 4h 10m
>  Remaining Estimate: 0h
>
> This class will use `getThreadCpuTime` to determine when to signal 
> `shouldExit`.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@solr.apache.org
For additional commands, e-mail: issues-h...@solr.apache.org



[jira] [Created] (SOLR-17172) Add QueryLimits termination to existing heavy SearchComponent-s

2024-02-21 Thread Andrzej Bialecki (Jira)
Andrzej Bialecki created SOLR-17172:
---

 Summary: Add QueryLimits termination to existing heavy 
SearchComponent-s
 Key: SOLR-17172
 URL: https://issues.apache.org/jira/browse/SOLR-17172
 Project: Solr
  Issue Type: Sub-task
Reporter: Andrzej Bialecki
Assignee: Andrzej Bialecki


The purpose of this ticket is to review the existing {{SearchComponent}}-s that 
perform intensive tasks to see if they could be modified to check the 
{{QueryLimits.shouldExit()}} inside their execution.

This is not meant to be included in tight loops but to prevent individual 
components from completing multiple stages of costly work that will be 
discarded anyway on the exit from the component due to the exceeded limits 
(SOLR-17151).



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@solr.apache.org
For additional commands, e-mail: issues-h...@solr.apache.org



[jira] [Commented] (SOLR-17151) Review current usage of QueryLimits to ensure complete coverage

2024-02-21 Thread Andrzej Bialecki (Jira)


[ 
https://issues.apache.org/jira/browse/SOLR-17151?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17819299#comment-17819299
 ] 

Andrzej Bialecki commented on SOLR-17151:
-

Let's focus here on improving the checking between components as opposed on 
SOLR-17172.

> Review current usage of QueryLimits to ensure complete coverage
> ---
>
> Key: SOLR-17151
> URL: https://issues.apache.org/jira/browse/SOLR-17151
> Project: Solr
>  Issue Type: Sub-task
>  Components: Query Limits
>Reporter: Andrzej Bialecki
>Assignee: Gus Heck
>Priority: Major
>
> Resource usage by a query is not limited to the actual search within 
> {{QueryComponent}}. Other components invoked by {{SearchHandler}} may 
> significantly contribute to this usage, either before or after the 
> {{QueryComponent}}.
> Those components that already use {{QueryTimeout}} either directly or 
> indirectly will properly observe the limits and terminate if needed. However, 
> other components may be expensive or misbehaving but fail to observe the 
> limits imposed on the end-to-end query processing.
> One such obvious place where we could add this check is where the 
> {{SearchHandler}} loops over {{SearchComponent}-s - it should call explicitly 
> {{QueryLimits.shouldExit()}} to ensure that even if previously executed 
> component ignored the limits they will be still enforced at the 
> {{SearchHandler}} level. There may be other places like this, too.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@solr.apache.org
For additional commands, e-mail: issues-h...@solr.apache.org



[jira] [Comment Edited] (SOLR-17151) Review current usage of QueryLimits to ensure complete coverage

2024-02-21 Thread Andrzej Bialecki (Jira)


[ 
https://issues.apache.org/jira/browse/SOLR-17151?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17819299#comment-17819299
 ] 

Andrzej Bialecki edited comment on SOLR-17151 at 2/21/24 3:49 PM:
--

Let's focus here on improving the checking between components as opposed to 
SOLR-17172.


was (Author: ab):
Let's focus here on improving the checking between components as opposed on 
SOLR-17172.

> Review current usage of QueryLimits to ensure complete coverage
> ---
>
> Key: SOLR-17151
> URL: https://issues.apache.org/jira/browse/SOLR-17151
> Project: Solr
>  Issue Type: Sub-task
>  Components: Query Limits
>Reporter: Andrzej Bialecki
>Assignee: Gus Heck
>Priority: Major
>
> Resource usage by a query is not limited to the actual search within 
> {{QueryComponent}}. Other components invoked by {{SearchHandler}} may 
> significantly contribute to this usage, either before or after the 
> {{QueryComponent}}.
> Those components that already use {{QueryTimeout}} either directly or 
> indirectly will properly observe the limits and terminate if needed. However, 
> other components may be expensive or misbehaving but fail to observe the 
> limits imposed on the end-to-end query processing.
> One such obvious place where we could add this check is where the 
> {{SearchHandler}} loops over {{SearchComponent}-s - it should call explicitly 
> {{QueryLimits.shouldExit()}} to ensure that even if previously executed 
> component ignored the limits they will be still enforced at the 
> {{SearchHandler}} level. There may be other places like this, too.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@solr.apache.org
For additional commands, e-mail: issues-h...@solr.apache.org



[jira] [Commented] (SOLR-17158) Terminate distributed processing quickly when query limit is reached

2024-02-21 Thread Andrzej Bialecki (Jira)


[ 
https://issues.apache.org/jira/browse/SOLR-17158?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17819319#comment-17819319
 ] 

Andrzej Bialecki commented on SOLR-17158:
-

Adding some observations from reading the code in {{{}SolrIndexSearcher}} and 
{{HttpShardHandler}}.

It appears that currently when {{timeAllowed}} is reached it doesn’t cause 
termination of all other pending shard requests. I found this section in 
{{SolrIndexSearcher:284}}:

{{  try {}}
{{      super.search(query, collector);}}
{{    } catch (TimeLimitingCollector.TimeExceededException}}
{{        | ExitableDirectoryReader.ExitingReaderException}}
{{        | CancellableCollector.QueryCancelledException x) {}}
{{      log.warn("Query: [{}]; ", query, x);}}
{{      qr.setPartialResults(true);}}

In the case when it reaches {{timeAllowed}} limit (and our new {{QueryLimits}}, 
too) it simply sets {{partialResults=true}} and does NOT throw any exception, 
so all the layers above think that the result is a success.

I suspect the reason for this was that when {{timeAllowed}} was set we still 
wanted to retrieve partial results when the limit was hit, and throwing an 
exception here would prevent that.

OTOH, if we had a request param saying “discard everything when you reach a 
limit and cancel any ongoing requests” then we could throw an exception here, 
and {{ShardHandler}} would recognize this as an error and cancel all other 
shard requests that are still pending, so that replicas could avoid sending 
back their results that would be discarded anyway.

> Terminate distributed processing quickly when query limit is reached
> 
>
> Key: SOLR-17158
> URL: https://issues.apache.org/jira/browse/SOLR-17158
> Project: Solr
>  Issue Type: Sub-task
>  Components: Query Limits
>Reporter: Andrzej Bialecki
>Assignee: Gus Heck
>Priority: Major
>
> Solr should make sure that when query limits are reached and partial results 
> are not needed (and not wanted) then both the processing in shards and in the 
> query coordinator should be terminated as quickly as possible, and Solr 
> should minimize wasted resources spent on eg. returning data from the 
> remaining shards, merging responses in the coordinator, or returning any data 
> back to the user.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@solr.apache.org
For additional commands, e-mail: issues-h...@solr.apache.org



[jira] [Commented] (SOLR-17158) Terminate distributed processing quickly when query limit is reached

2024-02-23 Thread Andrzej Bialecki (Jira)


[ 
https://issues.apache.org/jira/browse/SOLR-17158?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17820030#comment-17820030
 ] 

Andrzej Bialecki commented on SOLR-17158:
-

FYI, it was necessary to add this parameter in SOLR-17172, I used 
{{partialResults=true}} to mean that we should stop processing and return 
partial results with "success" code and "partialResults" flag in the response, 
and {{partialResults=false}} to mean that we should throw an exception and 
discard any partial results.

> Terminate distributed processing quickly when query limit is reached
> 
>
> Key: SOLR-17158
> URL: https://issues.apache.org/jira/browse/SOLR-17158
> Project: Solr
>  Issue Type: Sub-task
>  Components: Query Limits
>Reporter: Andrzej Bialecki
>Assignee: Gus Heck
>Priority: Major
>
> Solr should make sure that when query limits are reached and partial results 
> are not needed (and not wanted) then both the processing in shards and in the 
> query coordinator should be terminated as quickly as possible, and Solr 
> should minimize wasted resources spent on eg. returning data from the 
> remaining shards, merging responses in the coordinator, or returning any data 
> back to the user.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@solr.apache.org
For additional commands, e-mail: issues-h...@solr.apache.org



[jira] [Commented] (SOLR-17158) Terminate distributed processing quickly when query limit is reached

2024-02-26 Thread Andrzej Bialecki (Jira)


[ 
https://issues.apache.org/jira/browse/SOLR-17158?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17820781#comment-17820781
 ] 

Andrzej Bialecki commented on SOLR-17158:
-

I'm not convinced we need a sysprop here... why shouldn't we use request 
handler's {{defaults}} and {{invariants}} sections in {{solrconfig.xml}} ? 
Using a sysprop effectively enforces the same default behavior for all replicas 
of all collections managed by this Solr node.

> Terminate distributed processing quickly when query limit is reached
> 
>
> Key: SOLR-17158
> URL: https://issues.apache.org/jira/browse/SOLR-17158
> Project: Solr
>  Issue Type: Sub-task
>  Components: Query Limits
>Reporter: Andrzej Bialecki
>Assignee: Gus Heck
>Priority: Major
>
> Solr should make sure that when query limits are reached and partial results 
> are not needed (and not wanted) then both the processing in shards and in the 
> query coordinator should be terminated as quickly as possible, and Solr 
> should minimize wasted resources spent on eg. returning data from the 
> remaining shards, merging responses in the coordinator, or returning any data 
> back to the user.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@solr.apache.org
For additional commands, e-mail: issues-h...@solr.apache.org



[jira] [Updated] (SOLR-17172) Add QueryLimits termination to existing heavy SearchComponent-s

2024-02-29 Thread Andrzej Bialecki (Jira)


 [ 
https://issues.apache.org/jira/browse/SOLR-17172?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andrzej Bialecki updated SOLR-17172:

Fix Version/s: 9.6.0

> Add QueryLimits termination to existing heavy SearchComponent-s
> ---
>
> Key: SOLR-17172
> URL: https://issues.apache.org/jira/browse/SOLR-17172
> Project: Solr
>  Issue Type: Sub-task
>Reporter: Andrzej Bialecki
>Assignee: Andrzej Bialecki
>Priority: Major
> Fix For: 9.6.0
>
>  Time Spent: 1.5h
>  Remaining Estimate: 0h
>
> The purpose of this ticket is to review the existing {{SearchComponent}}-s 
> that perform intensive tasks to see if they could be modified to check the 
> {{QueryLimits.shouldExit()}} inside their execution.
> This is not meant to be included in tight loops but to prevent individual 
> components from completing multiple stages of costly work that will be 
> discarded anyway on the exit from the component due to the exceeded limits 
> (SOLR-17151).



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@solr.apache.org
For additional commands, e-mail: issues-h...@solr.apache.org



[jira] [Updated] (SOLR-17172) Add QueryLimits termination to existing heavy SearchComponent-s

2024-02-29 Thread Andrzej Bialecki (Jira)


 [ 
https://issues.apache.org/jira/browse/SOLR-17172?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andrzej Bialecki updated SOLR-17172:

Component/s: Query Limits

> Add QueryLimits termination to existing heavy SearchComponent-s
> ---
>
> Key: SOLR-17172
> URL: https://issues.apache.org/jira/browse/SOLR-17172
> Project: Solr
>  Issue Type: Sub-task
>  Components: Query Limits
>Reporter: Andrzej Bialecki
>Assignee: Andrzej Bialecki
>Priority: Major
> Fix For: 9.6.0
>
>  Time Spent: 1.5h
>  Remaining Estimate: 0h
>
> The purpose of this ticket is to review the existing {{SearchComponent}}-s 
> that perform intensive tasks to see if they could be modified to check the 
> {{QueryLimits.shouldExit()}} inside their execution.
> This is not meant to be included in tight loops but to prevent individual 
> components from completing multiple stages of costly work that will be 
> discarded anyway on the exit from the component due to the exceeded limits 
> (SOLR-17151).



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@solr.apache.org
For additional commands, e-mail: issues-h...@solr.apache.org



[jira] [Resolved] (SOLR-17172) Add QueryLimits termination to existing heavy SearchComponent-s

2024-02-29 Thread Andrzej Bialecki (Jira)


 [ 
https://issues.apache.org/jira/browse/SOLR-17172?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andrzej Bialecki resolved SOLR-17172.
-
Resolution: Fixed

> Add QueryLimits termination to existing heavy SearchComponent-s
> ---
>
> Key: SOLR-17172
> URL: https://issues.apache.org/jira/browse/SOLR-17172
> Project: Solr
>  Issue Type: Sub-task
>Reporter: Andrzej Bialecki
>Assignee: Andrzej Bialecki
>Priority: Major
> Fix For: 9.6.0
>
>  Time Spent: 1.5h
>  Remaining Estimate: 0h
>
> The purpose of this ticket is to review the existing {{SearchComponent}}-s 
> that perform intensive tasks to see if they could be modified to check the 
> {{QueryLimits.shouldExit()}} inside their execution.
> This is not meant to be included in tight loops but to prevent individual 
> components from completing multiple stages of costly work that will be 
> discarded anyway on the exit from the component due to the exceeded limits 
> (SOLR-17151).



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@solr.apache.org
For additional commands, e-mail: issues-h...@solr.apache.org



[jira] [Updated] (SOLR-17182) Eliminate the need for 'solr.useExitableDirectoryReader' sysprop

2024-02-29 Thread Andrzej Bialecki (Jira)


 [ 
https://issues.apache.org/jira/browse/SOLR-17182?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andrzej Bialecki updated SOLR-17182:

Component/s: Query Limits

> Eliminate the need for 'solr.useExitableDirectoryReader' sysprop
> 
>
> Key: SOLR-17182
> URL: https://issues.apache.org/jira/browse/SOLR-17182
> Project: Solr
>  Issue Type: Sub-task
>  Components: Query Limits
>Reporter: Chris M. Hostetter
>Priority: Major
>
> As the {{QueryLimit}} functionality in Solr gets beefed up, and supports 
> multiple types of limits, it would be nice if we could find a way to 
> eliminate the need for the {{solr.useExitableDirectoryReader}} sysprop, and 
> instead just have codepaths that use the underlying IndexReader  (like 
> faceting, spellcheck, etc...)  automatically get a reader that enforces the 
> limits if/when limits are in use.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@solr.apache.org
For additional commands, e-mail: issues-h...@solr.apache.org



[jira] [Created] (SOLR-17199) EnvUtils in solr-solrj is missing EnvToSyspropMappings.properties from solr-core

2024-03-07 Thread Andrzej Bialecki (Jira)
Andrzej Bialecki created SOLR-17199:
---

 Summary: EnvUtils in solr-solrj is missing 
EnvToSyspropMappings.properties from solr-core
 Key: SOLR-17199
 URL: https://issues.apache.org/jira/browse/SOLR-17199
 Project: Solr
  Issue Type: Bug
  Security Level: Public (Default Security Level. Issues are Public)
Reporter: Andrzej Bialecki


Initially in SOLR-15960 {{EnvUtils}} was located in solr-core, together with 
its configuration resource {{EnvToSyspropMappings.properties}}. Then it has 
been moved from solr-core to solr-solrj but the configuration resource has been 
left in solr-core.

This unfortunately means that {{EnvUtils}} cannot be used without dependency on 
solr-core, unless user adds their own copy of the configuration resource to the 
classpath. Right now trying to use it (or using {{PropertiesUtil}} for property 
substitution) results in an exception from the static initializer:
{code}
Caused by: java.lang.NullPointerException
at java.base/java.util.Objects.requireNonNull(Objects.java:209)
at org.apache.solr.common.util.EnvUtils.(EnvUtils.java:51)
{code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@solr.apache.org
For additional commands, e-mail: issues-h...@solr.apache.org



[jira] [Commented] (SOLR-17199) EnvUtils in solr-solrj is missing EnvToSyspropMappings.properties from solr-core

2024-03-07 Thread Andrzej Bialecki (Jira)


[ 
https://issues.apache.org/jira/browse/SOLR-17199?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17824384#comment-17824384
 ] 

Andrzej Bialecki commented on SOLR-17199:
-

I didn't see it - thanks for fixing it!

> EnvUtils in solr-solrj is missing EnvToSyspropMappings.properties from 
> solr-core
> 
>
> Key: SOLR-17199
> URL: https://issues.apache.org/jira/browse/SOLR-17199
> Project: Solr
>  Issue Type: Bug
>  Security Level: Public(Default Security Level. Issues are Public) 
>Affects Versions: 9.5.0
>Reporter: Andrzej Bialecki
>Assignee: Jan Høydahl
>Priority: Major
> Fix For: 9.6.0
>
>
> Initially in SOLR-15960 {{EnvUtils}} was located in solr-core, together with 
> its configuration resource {{EnvToSyspropMappings.properties}}. Then it has 
> been moved from solr-core to solr-solrj but the configuration resource has 
> been left in solr-core.
> This unfortunately means that {{EnvUtils}} cannot be used without dependency 
> on solr-core, unless user adds their own copy of the configuration resource 
> to the classpath. Right now trying to use it (or using {{PropertiesUtil}} for 
> property substitution) results in an exception from the static initializer:
> {code}
> Caused by: java.lang.NullPointerException
>   at java.base/java.util.Objects.requireNonNull(Objects.java:209)
>   at org.apache.solr.common.util.EnvUtils.(EnvUtils.java:51)
> {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@solr.apache.org
For additional commands, e-mail: issues-h...@solr.apache.org



[jira] [Commented] (SOLR-17158) Terminate distributed processing quickly when query limit is reached

2024-04-04 Thread Andrzej Bialecki (Jira)


[ 
https://issues.apache.org/jira/browse/SOLR-17158?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17833944#comment-17833944
 ] 

Andrzej Bialecki commented on SOLR-17158:
-

[~dsmiley] these are not exactly equivalent - when a limit is reached it 
doesn't have to be related in any way to per-shard processing.

> Terminate distributed processing quickly when query limit is reached
> 
>
> Key: SOLR-17158
> URL: https://issues.apache.org/jira/browse/SOLR-17158
> Project: Solr
>  Issue Type: Sub-task
>  Components: Query Limits
>Reporter: Andrzej Bialecki
>Assignee: Gus Heck
>Priority: Major
>  Time Spent: 1h 50m
>  Remaining Estimate: 0h
>
> Solr should make sure that when query limits are reached and partial results 
> are not needed (and not wanted) then both the processing in shards and in the 
> query coordinator should be terminated as quickly as possible, and Solr 
> should minimize wasted resources spent on eg. returning data from the 
> remaining shards, merging responses in the coordinator, or returning any data 
> back to the user.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@solr.apache.org
For additional commands, e-mail: issues-h...@solr.apache.org



[jira] [Commented] (SOLR-17150) Create MemQueryLimit implementation

2024-04-30 Thread Andrzej Bialecki (Jira)


[ 
https://issues.apache.org/jira/browse/SOLR-17150?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17842339#comment-17842339
 ] 

Andrzej Bialecki commented on SOLR-17150:
-

After discussing this with other people it looks like the dynamic limits would 
be tricky to properly set and the interaction between the occasional legitimate 
heavier query traffic, updates (which would trigger searcher re-open and a mem 
usage spike) and other factors could cause too many failures.

Still, having support for a hard limit to prevent a total run-away that would 
result in OOM seems useful. I'll prepare another patch that contains just the 
hard limit.

> Create MemQueryLimit implementation
> ---
>
> Key: SOLR-17150
> URL: https://issues.apache.org/jira/browse/SOLR-17150
> Project: Solr
>  Issue Type: Sub-task
>  Components: Query Limits
>Reporter: Andrzej Bialecki
>Assignee: Andrzej Bialecki
>Priority: Major
>  Time Spent: 1h
>  Remaining Estimate: 0h
>
> An implementation of {{QueryTimeout}} that terminates misbehaving queries 
> that allocate too much memory for their execution.
> This is a bit more complicated than {{CpuQueryLimits}} because the first time 
> a query is submitted it may legitimately allocate many sizeable objects 
> (caches, field values, etc). So we want to catch and terminate queries that 
> either exceed any reasonable threshold (eg. 2GB), or significantly exceed a 
> time-weighted percentile of the recent queries.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@solr.apache.org
For additional commands, e-mail: issues-h...@solr.apache.org



[jira] [Commented] (SOLR-13350) Explore collector managers for multi-threaded search

2024-05-10 Thread Andrzej Bialecki (Jira)


[ 
https://issues.apache.org/jira/browse/SOLR-13350?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17845258#comment-17845258
 ] 

Andrzej Bialecki commented on SOLR-13350:
-

This is caused by breaking the end-to-end tracking of request context in 
{{{}SolrRequestInfo{}}}, which uses a thread-local deque to provide the same 
context for both the main and all sub-requests. This tracking is needed to 
setup the correct query timeout instance on the searcher ( {{QueryLimits}} ) 
for time-limited searches in the {{SolrIndexSearcher:727}} . However, now that 
this method is executed in a separate "searcherCollector" thread the 
{{SolrRequestInfo}} instance it obtains is empty because it doesn't match the 
original thread that set it.

> Explore collector managers for multi-threaded search
> 
>
> Key: SOLR-13350
> URL: https://issues.apache.org/jira/browse/SOLR-13350
> Project: Solr
>  Issue Type: New Feature
>Reporter: Ishan Chattopadhyaya
>Assignee: Ishan Chattopadhyaya
>Priority: Major
> Attachments: SOLR-13350.patch, SOLR-13350.patch, SOLR-13350.patch
>
>  Time Spent: 11h 20m
>  Remaining Estimate: 0h
>
> AFAICT, SolrIndexSearcher can be used only to search all the segments of an 
> index in series. However, using CollectorManagers, segments can be searched 
> concurrently and result in reduced latency. Opening this issue to explore the 
> effectiveness of using CollectorManagers in SolrIndexSearcher from latency 
> and throughput perspective.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@solr.apache.org
For additional commands, e-mail: issues-h...@solr.apache.org



[jira] [Commented] (SOLR-13350) Explore collector managers for multi-threaded search

2024-05-22 Thread Andrzej Bialecki (Jira)


[ 
https://issues.apache.org/jira/browse/SOLR-13350?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17848668#comment-17848668
 ] 

Andrzej Bialecki commented on SOLR-13350:
-

{quote}As of now, the timeAllowed requests are anyway executed without 
multithreading
{quote}
This is based on a {{QueryCommand.timeAllowed}} flag that is set only from the 
{{timeAllowed}} param. However, this concept was extended in SOLR-17138 to 
{{QueryLimits}} that is now initialized also using other params. There is 
indeed some inconsistency here that's a left-over from that change, in the 
sense that `QueryCommand.timeAllowed` should have been either removed 
completely or replaced with something like {{{}queryLimits{}}}, to make sure to 
check the current SolrRequestInfo for QueryLimits.

In any case, the minimal workaround for this could be to check 
{{QueryLimits.getCurrentLimits().isLimitsEnabled()}} instead of 
{{{}QueryCommand.timeAllowed{}}}. But a better fix would be to properly unbreak 
the tracking of the parent {{SolrRequestInfo}} in MT search.

> Explore collector managers for multi-threaded search
> 
>
> Key: SOLR-13350
> URL: https://issues.apache.org/jira/browse/SOLR-13350
> Project: Solr
>  Issue Type: New Feature
>Reporter: Ishan Chattopadhyaya
>Assignee: Ishan Chattopadhyaya
>Priority: Major
> Attachments: SOLR-13350.patch, SOLR-13350.patch, SOLR-13350.patch
>
>  Time Spent: 11.5h
>  Remaining Estimate: 0h
>
> AFAICT, SolrIndexSearcher can be used only to search all the segments of an 
> index in series. However, using CollectorManagers, segments can be searched 
> concurrently and result in reduced latency. Opening this issue to explore the 
> effectiveness of using CollectorManagers in SolrIndexSearcher from latency 
> and throughput perspective.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@solr.apache.org
For additional commands, e-mail: issues-h...@solr.apache.org



[jira] [Commented] (SOLR-17416) Streaming Expressions: Exception swallowed and not propagated back to the client leading to inconsistent results

2024-09-02 Thread Andrzej Bialecki (Jira)


[ 
https://issues.apache.org/jira/browse/SOLR-17416?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17878554#comment-17878554
 ] 

Andrzej Bialecki commented on SOLR-17416:
-

+1 for the proposed immediate fix.

> Streaming Expressions:  Exception swallowed and not propagated back to the 
> client leading to inconsistent results
> -
>
> Key: SOLR-17416
> URL: https://issues.apache.org/jira/browse/SOLR-17416
> Project: Solr
>  Issue Type: Bug
>  Security Level: Public(Default Security Level. Issues are Public) 
>  Components: Export Writer, streaming expressions
>Reporter: Lamine
>Priority: Major
> Attachments: SOLR-17416.patch
>
>
> There appears to be a bug in the _ExportWriter/ExportBuffers_ implementation 
> within the Streaming Expressions plugin. Specifically, when an 
> InterruptedException occurs due to an ExportBuffers timeout, the exception is 
> swallowed and not propagated back to the client (still logged on the server 
> side though).
> As a result, the client receives an EOF marker, thinking that it has received 
> the full set of results, when in fact it has only received partial results. 
> This leads to inconsistent search results, as the client is unaware that the 
> export process was interrupted and terminated prematurely.  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@solr.apache.org
For additional commands, e-mail: issues-h...@solr.apache.org



[jira] [Commented] (SOLR-17430) Redesign ExportWriter / ExportBuffers to work better with large batchSizes and slow consumption

2024-09-02 Thread Andrzej Bialecki (Jira)


[ 
https://issues.apache.org/jira/browse/SOLR-17430?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17878556#comment-17878556
 ] 

Andrzej Bialecki commented on SOLR-17430:
-

Originally this design was an evolution of a single buffer-based older design, 
where the "filler" and "writer" phases ran sequentially in the same thread. I 
agree that something we initially thought would be a simple extension ended up 
quite complicated :) 

[~jbernste] and I ran several benchmarks using the old and the current design, 
which showed big performance improvements in the current design. I think that 
these speedups benefited from the bulk (buffer-based) operations for both read 
and write sides of the process. Using a queue definitely simplifies the design 
but I'm worried we may lose some of these performance gains when processing is 
done item-by-item and not in bulk. OTOH this may not be such a huge factor 
overall, and if it allows us to simplify the code and better control the flow, 
then it may be worth it even with some performance penalty.

> Redesign ExportWriter / ExportBuffers to work better with large batchSizes 
> and slow consumption
> ---
>
> Key: SOLR-17430
> URL: https://issues.apache.org/jira/browse/SOLR-17430
> Project: Solr
>  Issue Type: Improvement
>  Security Level: Public(Default Security Level. Issues are Public) 
>Reporter: Chris M. Hostetter
>Priority: Major
>
> As mentioned in SOLR-17416, the design of the {{ExportBuffers}} class used by 
> the {{ExportHandler}} is brittle and the absolutely time limit on how long 
> the buffer swapping threads will wait for eachother isn't suitable for very 
> long running streaming expressions...
> {quote}The problem however is that this 600 second timeout may not be enough 
> to account for really slow downstream consumption of the data.  With really 
> large collections, and really complicated streaming expressions, this can 
> happen even when well behaved clients that are actively trying to consume 
> data.
> {quote}
> ...but another sub-optimal aspect of this buffer swapping design is that the 
> "writer" thread is initially completely blocked, and can't write out a single 
> document, until the "filler" thread has read the full {{batchSize}} of 
> documents into it's buffer and opted to swap.  Likewise, after buffer 
> swapping has occured at least once, any document in the {{outputBuffer}} that 
> the writer has already processed hangs around, taking up ram, until the next 
> swap, while one of the threads is idle.  If {{{}batchSize=3{}}}, and the 
> "filler" thread is ready to go with a full {{fillBuffer}} while the "writer" 
> has only been able to emit 2 of the documents in it's {{outputBuffer}} 
> documents before being blocked and forced to wait (due to the downstream 
> consumer of the output bytes) before it can emit the last document in it's 
> batch – that means both the "writer" thread and the "filler" thread are 
> stalled, taking up 2x the batchSize of ram, even though half of that is data 
> that is no longer needed.
> The bigger the {{batchSize}} the worse the initial delay (and steady state 
> wasted RAM) is.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@solr.apache.org
For additional commands, e-mail: issues-h...@solr.apache.org



[jira] [Created] (SOLR-15272) Solr Admin UI uses non-standard unit for the number of docs

2021-03-17 Thread Andrzej Bialecki (Jira)
Andrzej Bialecki created SOLR-15272:
---

 Summary: Solr Admin UI uses non-standard unit for the number of 
docs
 Key: SOLR-15272
 URL: https://issues.apache.org/jira/browse/SOLR-15272
 Project: Solr
  Issue Type: Improvement
  Security Level: Public (Default Security Level. Issues are Public)
Affects Versions: main (9.0)
Reporter: Andrzej Bialecki


I just noticed the following in the Admin UI / Cloud / Nodes section:
{quote}gettingstarted_s1r2 (1.9mn docs)
{quote}
AFAIK there's no widely recognized "mn" unit :) it should be "mln" or perhaps 
"M" (for the "mega" prefix).



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (SOLR-15300) Shard "state" flag is confusing and of limited value to outside consumers

2021-03-29 Thread Andrzej Bialecki (Jira)
Andrzej Bialecki created SOLR-15300:
---

 Summary: Shard "state" flag is confusing and of limited value to 
outside consumers
 Key: SOLR-15300
 URL: https://issues.apache.org/jira/browse/SOLR-15300
 Project: Solr
  Issue Type: Improvement
  Security Level: Public (Default Security Level. Issues are Public)
Reporter: Andrzej Bialecki
Assignee: Andrzej Bialecki


Solr API (and consequently the metric reporters, which are often used for Solr 
monitoring) report the shard as being in ACTIVE state even when in reality its 
functionality is severely compromised (eg. no replicas, all replicas down, or 
no leader).

This reported state is technically correct because it is used only for tracking 
of the SPLITSHARD operations, as defined in {{Slice.State}}. However, this may 
be misleading and more often unhelpful than not - for constant monitoring a 
flag that actually reports impaired functionality of a shard would be more 
useful than a flag that reports a relatively uncommon SPLITSHARD operation.

We could either redefine the meaning of the existing flag (and change its state 
according to some of the criteria I listed above), or add another flag to 
represent the "health" status of a shard. The value of this flag would then 
provide an easy way to monitor and to alert external systems of dangerous 
function impairment, without monitoring the state of all replicas of a 
collection.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@solr.apache.org
For additional commands, e-mail: issues-h...@solr.apache.org



[jira] [Updated] (SOLR-15232) Add replica(s) as a part of node startup

2021-04-06 Thread Andrzej Bialecki (Jira)


 [ 
https://issues.apache.org/jira/browse/SOLR-15232?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andrzej Bialecki updated SOLR-15232:

Fix Version/s: main (9.0)

> Add replica(s) as a part of node startup
> 
>
> Key: SOLR-15232
> URL: https://issues.apache.org/jira/browse/SOLR-15232
> Project: Solr
>  Issue Type: Improvement
>  Security Level: Public(Default Security Level. Issues are Public) 
>Reporter: Andrzej Bialecki
>Assignee: Andrzej Bialecki
>Priority: Major
> Fix For: main (9.0)
>
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> In containerized environments it would make sense to be able to initialize a 
> new node (pod) and designate it immediately to hold newly created replica(s) 
> of specified collection/shard(s) once it's up and running.
> Currently this is not easy to do, it requires the intervention of an external 
> agent that additionally has to first check if the node is up, all of which 
> makes the process needlessly complicated.
> This functionality could be as simple as adding a command-line switch to 
> {{bin/solr start}}, which would cause it to invoke appropriate ADDREPLICA 
> commands once it verifies the node is up.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@solr.apache.org
For additional commands, e-mail: issues-h...@solr.apache.org



[jira] [Commented] (SOLR-15232) Add replica(s) as a part of node startup

2021-04-07 Thread Andrzej Bialecki (Jira)


[ 
https://issues.apache.org/jira/browse/SOLR-15232?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17316328#comment-17316328
 ] 

Andrzej Bialecki commented on SOLR-15232:
-

Ad 1.
Solr autoscaling is not aware of how the pods are managed. When you add a new 
pod and want to populate it this always requires additional actions, 
orchestrated by an outside agent.

In Solr 8x you could automate part of it (adding replicas to new empty nodes) 
by using {{nodeAdded}} triggers but this is gone in 9x.

Ad 2.
AFAIK there's no specific mechanism in k8s that would allow you to "customize" 
a particular pod instance on startup, i.e. to specify during pod creation that 
it should host a specific replica. Furthermore, external agents need to first 
check that the pod is up before proceeding, which complicates their design. 
This PR fills this void because you don't need any external agent to 
orchestrate the creation of replicas on new pods - you can just pass system 
properties to tell the pod to automatically add replicas that you want to host 
there, as soon as the Solr CoreContainer is up - and shut it down if it fails 
to initialize. This provides an easier way to auto-scale by using k8s 
autoscaler without modifications.

Ad 3.
This is again a multiple step process that requires an external agent to 
coordinate it. I'm not saying it can't be done (obviously), but the way I 
propose it could simplify the process.

> Add replica(s) as a part of node startup
> 
>
> Key: SOLR-15232
> URL: https://issues.apache.org/jira/browse/SOLR-15232
> Project: Solr
>  Issue Type: Improvement
>  Security Level: Public(Default Security Level. Issues are Public) 
>Reporter: Andrzej Bialecki
>Assignee: Andrzej Bialecki
>Priority: Major
> Fix For: main (9.0)
>
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> In containerized environments it would make sense to be able to initialize a 
> new node (pod) and designate it immediately to hold newly created replica(s) 
> of specified collection/shard(s) once it's up and running.
> Currently this is not easy to do, it requires the intervention of an external 
> agent that additionally has to first check if the node is up, all of which 
> makes the process needlessly complicated.
> This functionality could be as simple as adding a command-line switch to 
> {{bin/solr start}}, which would cause it to invoke appropriate ADDREPLICA 
> commands once it verifies the node is up.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@solr.apache.org
For additional commands, e-mail: issues-h...@solr.apache.org



[jira] [Commented] (SOLR-15300) Shard "state" flag is confusing and of limited value to outside consumers

2021-04-07 Thread Andrzej Bialecki (Jira)


[ 
https://issues.apache.org/jira/browse/SOLR-15300?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17316366#comment-17316366
 ] 

Andrzej Bialecki commented on SOLR-15300:
-

The {{replicationFactor}} is ill-defined, at least the way it's used. It 
doesn't reflect anything other than the initial setup - you are free to add / 
remove replicas and then it no longer holds true. It doesn't reflect per shard 
replication either.

I would go even further - we should remove it from collection state because 
it's misleading.

Another question is "what is the intended replication factor and how to measure 
it"? This is not obvious either because it may depend on circumstances (eg. 
adding replicas during search traffic spikes and removing them afterwards). 
This may be a task for some external agent to figure out.

I think it's much easier to focus in this issue on clearly reporting the most 
common abnormal states - eg. shard has replicas down/recovering, shard has no 
replicas, shard has no leader.

Also, at the Java level you can already get all this information, so I think 
the scope of this issue is only what to do about the external reporting / 
monitoring, either via metrics or via ClusterState / Slice. As such, I think 
that we don't have to explicitly store this state anywhere, we can construct it 
on the fly for the purpose of reporting.

> Shard "state" flag is confusing and of limited value to outside consumers
> -
>
> Key: SOLR-15300
> URL: https://issues.apache.org/jira/browse/SOLR-15300
> Project: Solr
>  Issue Type: Improvement
>  Security Level: Public(Default Security Level. Issues are Public) 
>Reporter: Andrzej Bialecki
>Assignee: Andrzej Bialecki
>Priority: Major
>
> Solr API (and consequently the metric reporters, which are often used for 
> Solr monitoring) report the shard as being in ACTIVE state even when in 
> reality its functionality is severely compromised (eg. no replicas, all 
> replicas down, or no leader).
> This reported state is technically correct because it is used only for 
> tracking of the SPLITSHARD operations, as defined in {{Slice.State}}. 
> However, this may be misleading and more often unhelpful than not - for 
> constant monitoring a flag that actually reports impaired functionality of a 
> shard would be more useful than a flag that reports a relatively uncommon 
> SPLITSHARD operation.
> We could either redefine the meaning of the existing flag (and change its 
> state according to some of the criteria I listed above), or add another flag 
> to represent the "health" status of a shard. The value of this flag would 
> then provide an easy way to monitor and to alert external systems of 
> dangerous function impairment, without monitoring the state of all replicas 
> of a collection.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@solr.apache.org
For additional commands, e-mail: issues-h...@solr.apache.org



[jira] [Commented] (SOLR-15300) Shard "state" flag is confusing and of limited value to outside consumers

2021-04-07 Thread Andrzej Bialecki (Jira)


[ 
https://issues.apache.org/jira/browse/SOLR-15300?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17316419#comment-17316419
 ] 

Andrzej Bialecki commented on SOLR-15300:
-

So maybe it could be as simple as adding in the CLUSTERSTATUS response a 
"status" property for each shard, calculated on the fly.

> Shard "state" flag is confusing and of limited value to outside consumers
> -
>
> Key: SOLR-15300
> URL: https://issues.apache.org/jira/browse/SOLR-15300
> Project: Solr
>  Issue Type: Improvement
>  Security Level: Public(Default Security Level. Issues are Public) 
>Reporter: Andrzej Bialecki
>Assignee: Andrzej Bialecki
>Priority: Major
>
> Solr API (and consequently the metric reporters, which are often used for 
> Solr monitoring) report the shard as being in ACTIVE state even when in 
> reality its functionality is severely compromised (eg. no replicas, all 
> replicas down, or no leader).
> This reported state is technically correct because it is used only for 
> tracking of the SPLITSHARD operations, as defined in {{Slice.State}}. 
> However, this may be misleading and more often unhelpful than not - for 
> constant monitoring a flag that actually reports impaired functionality of a 
> shard would be more useful than a flag that reports a relatively uncommon 
> SPLITSHARD operation.
> We could either redefine the meaning of the existing flag (and change its 
> state according to some of the criteria I listed above), or add another flag 
> to represent the "health" status of a shard. The value of this flag would 
> then provide an easy way to monitor and to alert external systems of 
> dangerous function impairment, without monitoring the state of all replicas 
> of a collection.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@solr.apache.org
For additional commands, e-mail: issues-h...@solr.apache.org



[jira] [Commented] (SOLR-15300) Shard "state" flag is confusing and of limited value to outside consumers

2021-04-08 Thread Andrzej Bialecki (Jira)


[ 
https://issues.apache.org/jira/browse/SOLR-15300?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17317022#comment-17317022
 ] 

Andrzej Bialecki commented on SOLR-15300:
-

bq. Well, the intended replicationFactor for a given shard is the number of 
replicas currently registered with CLUSTERSTATUS

That would make sense, indeed - though this has no relation whatsoever to the 
actual value of {{replicationFactor}} property.

bq. should either be in its own sub-tree next to "collections" or clearly 
marked as "_live-state" or similar

Agreed. I would prefer to put it into each collection's props, perhaps using a 
less awkward name "liveState" ? after all, we already report here other 
calculated data that doesn't come from state.json, such as aliases and roles.



> Shard "state" flag is confusing and of limited value to outside consumers
> -
>
> Key: SOLR-15300
> URL: https://issues.apache.org/jira/browse/SOLR-15300
> Project: Solr
>  Issue Type: Improvement
>  Security Level: Public(Default Security Level. Issues are Public) 
>Reporter: Andrzej Bialecki
>Assignee: Andrzej Bialecki
>Priority: Major
>
> Solr API (and consequently the metric reporters, which are often used for 
> Solr monitoring) report the shard as being in ACTIVE state even when in 
> reality its functionality is severely compromised (eg. no replicas, all 
> replicas down, or no leader).
> This reported state is technically correct because it is used only for 
> tracking of the SPLITSHARD operations, as defined in {{Slice.State}}. 
> However, this may be misleading and more often unhelpful than not - for 
> constant monitoring a flag that actually reports impaired functionality of a 
> shard would be more useful than a flag that reports a relatively uncommon 
> SPLITSHARD operation.
> We could either redefine the meaning of the existing flag (and change its 
> state according to some of the criteria I listed above), or add another flag 
> to represent the "health" status of a shard. The value of this flag would 
> then provide an easy way to monitor and to alert external systems of 
> dangerous function impairment, without monitoring the state of all replicas 
> of a collection.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@solr.apache.org
For additional commands, e-mail: issues-h...@solr.apache.org



[jira] [Commented] (SOLR-15300) Shard "state" flag is confusing and of limited value to outside consumers

2021-04-12 Thread Andrzej Bialecki (Jira)


[ 
https://issues.apache.org/jira/browse/SOLR-15300?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17319504#comment-17319504
 ] 

Andrzej Bialecki commented on SOLR-15300:
-

Based on the Slack discussions, I propose to add the following information to 
the output of CLUSTERSTATUS command:
 * add a calculated (not stored in DocCollection) "health" property at the 
level of each shard and each collection.
 * use the following symbolic names for the health state:
 ** GREEN: all replicas up, leader exists,
 ** YELLOW: some replicas down, leader exists,
 ** ORANGE: many replicas down, leader exists,
 ** RED: most replicas down, or no leader.
 * use 66% and 33% of active replicas as the thresholds between 
yellow/orange/red.
 * the collection-level health status will be reported as the worst status of 
any shard.

The notion of having a flag for a "read only" collection (when there's no 
leader or only PULL replicas) needs further thought, because there's already a 
"readOnly" flag that users can explicitly set using MODIFYCOLLECTION (this flag 
is also used in REINDEXCOLLECTION).

> Shard "state" flag is confusing and of limited value to outside consumers
> -
>
> Key: SOLR-15300
> URL: https://issues.apache.org/jira/browse/SOLR-15300
> Project: Solr
>  Issue Type: Improvement
>  Security Level: Public(Default Security Level. Issues are Public) 
>Reporter: Andrzej Bialecki
>Assignee: Andrzej Bialecki
>Priority: Major
>
> Solr API (and consequently the metric reporters, which are often used for 
> Solr monitoring) report the shard as being in ACTIVE state even when in 
> reality its functionality is severely compromised (eg. no replicas, all 
> replicas down, or no leader).
> This reported state is technically correct because it is used only for 
> tracking of the SPLITSHARD operations, as defined in {{Slice.State}}. 
> However, this may be misleading and more often unhelpful than not - for 
> constant monitoring a flag that actually reports impaired functionality of a 
> shard would be more useful than a flag that reports a relatively uncommon 
> SPLITSHARD operation.
> We could either redefine the meaning of the existing flag (and change its 
> state according to some of the criteria I listed above), or add another flag 
> to represent the "health" status of a shard. The value of this flag would 
> then provide an easy way to monitor and to alert external systems of 
> dangerous function impairment, without monitoring the state of all replicas 
> of a collection.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@solr.apache.org
For additional commands, e-mail: issues-h...@solr.apache.org



[jira] [Commented] (SOLR-15341) Lucene has removed CodecReader#ramBytesUsed

2021-04-15 Thread Andrzej Bialecki (Jira)


[ 
https://issues.apache.org/jira/browse/SOLR-15341?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17322213#comment-17322213
 ] 

Andrzej Bialecki commented on SOLR-15341:
-

bq. I'm not 100% sure where this ram info was used in Solr
It was purely informative, there should be no hard dependencies on it in Solr.

> Lucene has removed CodecReader#ramBytesUsed
> ---
>
> Key: SOLR-15341
> URL: https://issues.apache.org/jira/browse/SOLR-15341
> Project: Solr
>  Issue Type: New Feature
>  Security Level: Public(Default Security Level. Issues are Public) 
>Reporter: Jan Høydahl
>Priority: Major
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> Due to LUCENE-9387 Solr no longer compiles. Accountability of CodecReader RAM 
> usage is removed.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@solr.apache.org
For additional commands, e-mail: issues-h...@solr.apache.org



[jira] [Commented] (SOLR-15019) Replica placement API needs a way to fetch existing replica metrics

2021-04-21 Thread Andrzej Bialecki (Jira)


[ 
https://issues.apache.org/jira/browse/SOLR-15019?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17326713#comment-17326713
 ] 

Andrzej Bialecki commented on SOLR-15019:
-

That's a good point, which I didn't consider. I can revert this part of the 
change - [~ilan] WDYT?

> Replica placement API needs a way to fetch existing replica metrics
> ---
>
> Key: SOLR-15019
> URL: https://issues.apache.org/jira/browse/SOLR-15019
> Project: Solr
>  Issue Type: Improvement
>Reporter: Andrzej Bialecki
>Assignee: Andrzej Bialecki
>Priority: Major
> Fix For: main (9.0)
>
>  Time Spent: 9h 20m
>  Remaining Estimate: 0h
>
> Replica placement API was introduced in SOLR-14613. It offers a few sample 
> (and simple) implementations of placement plugins.
> However, this API doesn't offer support for retrieving per-replica metrics, 
> which are required for calculating more realistic placements. For example, 
> when calculating placements for ADDREPLICA on an already existing collection 
> the plugin should know what is the size of replica in order to avoid placing 
> large replicas on nodes with insufficient free disk space.
> After discussing this with [~ilan] we propose the following additions to the 
> API:
> * use the existing {{AttributeFetcher}} interface as a facade for retrieving 
> per-replica values (currently it only retrieves per-node values)
> * add {{ShardValues}} interface to represent strongly-typed API for key 
> metrics, such as replica size, number of docs, number of update and search 
> requests.
> Plugins could then use this API like this:
> {code}
> AttributeFetcher attributeFetcher = ...
> SolrCollection solrCollection = ...
> Set metricNames = ...
> attributeFetcher.requestCollectionMetrics(solrCollection, 
> solrCollection.getShardNames(), metricNames);
> AttributeValues attributeValues = attributeFetcher.fetchAttributes();
> ShardValues shardValues = 
> attributeValues.getShardMetrics(solrCollection.getName(), shardName);
> int sizeInGB = shardValues.getSizeInGB(); // retrieves shard leader metrics
> int replicaSizeInGB = shardValues.getSizeInGB(replica);
> {code} 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@solr.apache.org
For additional commands, e-mail: issues-h...@solr.apache.org



[jira] [Reopened] (SOLR-15019) Replica placement API needs a way to fetch existing replica metrics

2021-04-21 Thread Andrzej Bialecki (Jira)


 [ 
https://issues.apache.org/jira/browse/SOLR-15019?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andrzej Bialecki reopened SOLR-15019:
-

> Replica placement API needs a way to fetch existing replica metrics
> ---
>
> Key: SOLR-15019
> URL: https://issues.apache.org/jira/browse/SOLR-15019
> Project: Solr
>  Issue Type: Improvement
>Reporter: Andrzej Bialecki
>Assignee: Andrzej Bialecki
>Priority: Major
> Fix For: main (9.0)
>
>  Time Spent: 9h 20m
>  Remaining Estimate: 0h
>
> Replica placement API was introduced in SOLR-14613. It offers a few sample 
> (and simple) implementations of placement plugins.
> However, this API doesn't offer support for retrieving per-replica metrics, 
> which are required for calculating more realistic placements. For example, 
> when calculating placements for ADDREPLICA on an already existing collection 
> the plugin should know what is the size of replica in order to avoid placing 
> large replicas on nodes with insufficient free disk space.
> After discussing this with [~ilan] we propose the following additions to the 
> API:
> * use the existing {{AttributeFetcher}} interface as a facade for retrieving 
> per-replica values (currently it only retrieves per-node values)
> * add {{ShardValues}} interface to represent strongly-typed API for key 
> metrics, such as replica size, number of docs, number of update and search 
> requests.
> Plugins could then use this API like this:
> {code}
> AttributeFetcher attributeFetcher = ...
> SolrCollection solrCollection = ...
> Set metricNames = ...
> attributeFetcher.requestCollectionMetrics(solrCollection, 
> solrCollection.getShardNames(), metricNames);
> AttributeValues attributeValues = attributeFetcher.fetchAttributes();
> ShardValues shardValues = 
> attributeValues.getShardMetrics(solrCollection.getName(), shardName);
> int sizeInGB = shardValues.getSizeInGB(); // retrieves shard leader metrics
> int replicaSizeInGB = shardValues.getSizeInGB(replica);
> {code} 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@solr.apache.org
For additional commands, e-mail: issues-h...@solr.apache.org



[jira] [Commented] (SOLR-15019) Replica placement API needs a way to fetch existing replica metrics

2021-04-22 Thread Andrzej Bialecki (Jira)


[ 
https://issues.apache.org/jira/browse/SOLR-15019?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17327241#comment-17327241
 ] 

Andrzej Bialecki commented on SOLR-15019:
-

Ok, I'll remove this until we actually need it.

> Replica placement API needs a way to fetch existing replica metrics
> ---
>
> Key: SOLR-15019
> URL: https://issues.apache.org/jira/browse/SOLR-15019
> Project: Solr
>  Issue Type: Improvement
>Reporter: Andrzej Bialecki
>Assignee: Andrzej Bialecki
>Priority: Major
> Fix For: main (9.0)
>
>  Time Spent: 9h 20m
>  Remaining Estimate: 0h
>
> Replica placement API was introduced in SOLR-14613. It offers a few sample 
> (and simple) implementations of placement plugins.
> However, this API doesn't offer support for retrieving per-replica metrics, 
> which are required for calculating more realistic placements. For example, 
> when calculating placements for ADDREPLICA on an already existing collection 
> the plugin should know what is the size of replica in order to avoid placing 
> large replicas on nodes with insufficient free disk space.
> After discussing this with [~ilan] we propose the following additions to the 
> API:
> * use the existing {{AttributeFetcher}} interface as a facade for retrieving 
> per-replica values (currently it only retrieves per-node values)
> * add {{ShardValues}} interface to represent strongly-typed API for key 
> metrics, such as replica size, number of docs, number of update and search 
> requests.
> Plugins could then use this API like this:
> {code}
> AttributeFetcher attributeFetcher = ...
> SolrCollection solrCollection = ...
> Set metricNames = ...
> attributeFetcher.requestCollectionMetrics(solrCollection, 
> solrCollection.getShardNames(), metricNames);
> AttributeValues attributeValues = attributeFetcher.fetchAttributes();
> ShardValues shardValues = 
> attributeValues.getShardMetrics(solrCollection.getName(), shardName);
> int sizeInGB = shardValues.getSizeInGB(); // retrieves shard leader metrics
> int replicaSizeInGB = shardValues.getSizeInGB(replica);
> {code} 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@solr.apache.org
For additional commands, e-mail: issues-h...@solr.apache.org



[jira] [Resolved] (SOLR-15019) Replica placement API needs a way to fetch existing replica metrics

2021-04-26 Thread Andrzej Bialecki (Jira)


 [ 
https://issues.apache.org/jira/browse/SOLR-15019?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andrzej Bialecki resolved SOLR-15019.
-
Resolution: Fixed

> Replica placement API needs a way to fetch existing replica metrics
> ---
>
> Key: SOLR-15019
> URL: https://issues.apache.org/jira/browse/SOLR-15019
> Project: Solr
>  Issue Type: Improvement
>Reporter: Andrzej Bialecki
>Assignee: Andrzej Bialecki
>Priority: Major
> Fix For: main (9.0)
>
>  Time Spent: 9h 20m
>  Remaining Estimate: 0h
>
> Replica placement API was introduced in SOLR-14613. It offers a few sample 
> (and simple) implementations of placement plugins.
> However, this API doesn't offer support for retrieving per-replica metrics, 
> which are required for calculating more realistic placements. For example, 
> when calculating placements for ADDREPLICA on an already existing collection 
> the plugin should know what is the size of replica in order to avoid placing 
> large replicas on nodes with insufficient free disk space.
> After discussing this with [~ilan] we propose the following additions to the 
> API:
> * use the existing {{AttributeFetcher}} interface as a facade for retrieving 
> per-replica values (currently it only retrieves per-node values)
> * add {{ShardValues}} interface to represent strongly-typed API for key 
> metrics, such as replica size, number of docs, number of update and search 
> requests.
> Plugins could then use this API like this:
> {code}
> AttributeFetcher attributeFetcher = ...
> SolrCollection solrCollection = ...
> Set metricNames = ...
> attributeFetcher.requestCollectionMetrics(solrCollection, 
> solrCollection.getShardNames(), metricNames);
> AttributeValues attributeValues = attributeFetcher.fetchAttributes();
> ShardValues shardValues = 
> attributeValues.getShardMetrics(solrCollection.getName(), shardName);
> int sizeInGB = shardValues.getSizeInGB(); // retrieves shard leader metrics
> int replicaSizeInGB = shardValues.getSizeInGB(replica);
> {code} 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@solr.apache.org
For additional commands, e-mail: issues-h...@solr.apache.org



[jira] [Created] (SOLR-15379) Fix API incompatibility after LUCENE-9905

2021-04-28 Thread Andrzej Bialecki (Jira)
Andrzej Bialecki created SOLR-15379:
---

 Summary: Fix API incompatibility after LUCENE-9905
 Key: SOLR-15379
 URL: https://issues.apache.org/jira/browse/SOLR-15379
 Project: Solr
  Issue Type: Bug
  Security Level: Public (Default Security Level. Issues are Public)
Reporter: Andrzej Bialecki
Assignee: Andrzej Bialecki






--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@solr.apache.org
For additional commands, e-mail: issues-h...@solr.apache.org



[jira] [Updated] (SOLR-15379) Fix API incompatibility after LUCENE-9905

2021-04-28 Thread Andrzej Bialecki (Jira)


 [ 
https://issues.apache.org/jira/browse/SOLR-15379?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andrzej Bialecki updated SOLR-15379:

Attachment: SOLR-15379.patch

> Fix API incompatibility after LUCENE-9905
> -
>
> Key: SOLR-15379
> URL: https://issues.apache.org/jira/browse/SOLR-15379
> Project: Solr
>  Issue Type: Bug
>  Security Level: Public(Default Security Level. Issues are Public) 
>Reporter: Andrzej Bialecki
>Assignee: Andrzej Bialecki
>Priority: Minor
> Attachments: SOLR-15379.patch
>
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@solr.apache.org
For additional commands, e-mail: issues-h...@solr.apache.org



[jira] [Created] (SOLR-15395) Report collection / shard "health" status in the Admin UI

2021-05-06 Thread Andrzej Bialecki (Jira)
Andrzej Bialecki created SOLR-15395:
---

 Summary: Report collection / shard "health" status in the Admin UI
 Key: SOLR-15395
 URL: https://issues.apache.org/jira/browse/SOLR-15395
 Project: Solr
  Issue Type: Improvement
  Security Level: Public (Default Security Level. Issues are Public)
  Components: Admin UI
Affects Versions: main (9.0)
Reporter: Andrzej Bialecki


SOLR-15300 added a "health" status report to the output of CLUSTERSTATUS 
command. This should be also shown in the UI to allow users to visually check 
this status.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@solr.apache.org
For additional commands, e-mail: issues-h...@solr.apache.org



[jira] [Created] (SOLR-15396) Expose collection / shard "health" state in Prometheus exporter

2021-05-06 Thread Andrzej Bialecki (Jira)
Andrzej Bialecki created SOLR-15396:
---

 Summary: Expose collection / shard "health" state in Prometheus 
exporter
 Key: SOLR-15396
 URL: https://issues.apache.org/jira/browse/SOLR-15396
 Project: Solr
  Issue Type: Improvement
  Security Level: Public (Default Security Level. Issues are Public)
  Components: metrics
Affects Versions: main (9.0)
Reporter: Andrzej Bialecki


SOLR-15300 added a "health" status for collections and shards. This should be 
also exposed via Prometheus exporter.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@solr.apache.org
For additional commands, e-mail: issues-h...@solr.apache.org



[jira] [Commented] (SOLR-15300) Shard "state" flag is confusing and of limited value to outside consumers

2021-05-06 Thread Andrzej Bialecki (Jira)


[ 
https://issues.apache.org/jira/browse/SOLR-15300?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17340137#comment-17340137
 ] 

Andrzej Bialecki commented on SOLR-15300:
-

[~janhoy] I created SOLR-15395 and SOLR-15396.

Cluster level "health" status is somewhat different because it should probably 
consider not only the state of the collections but also of the nodes. Let's 
discuss this in a separate Jira.

> Shard "state" flag is confusing and of limited value to outside consumers
> -
>
> Key: SOLR-15300
> URL: https://issues.apache.org/jira/browse/SOLR-15300
> Project: Solr
>  Issue Type: Improvement
>Reporter: Andrzej Bialecki
>Assignee: Andrzej Bialecki
>Priority: Major
>  Time Spent: 1h 10m
>  Remaining Estimate: 0h
>
> Solr API (and consequently the metric reporters, which are often used for 
> Solr monitoring) report the shard as being in ACTIVE state even when in 
> reality its functionality is severely compromised (eg. no replicas, all 
> replicas down, or no leader).
> This reported state is technically correct because it is used only for 
> tracking of the SPLITSHARD operations, as defined in {{Slice.State}}. 
> However, this may be misleading and more often unhelpful than not - for 
> constant monitoring a flag that actually reports impaired functionality of a 
> shard would be more useful than a flag that reports a relatively uncommon 
> SPLITSHARD operation.
> We could either redefine the meaning of the existing flag (and change its 
> state according to some of the criteria I listed above), or add another flag 
> to represent the "health" status of a shard. The value of this flag would 
> then provide an easy way to monitor and to alert external systems of 
> dangerous function impairment, without monitoring the state of all replicas 
> of a collection.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@solr.apache.org
For additional commands, e-mail: issues-h...@solr.apache.org



[jira] [Resolved] (SOLR-15232) Add replica(s) as a part of node startup

2021-05-12 Thread Andrzej Bialecki (Jira)


 [ 
https://issues.apache.org/jira/browse/SOLR-15232?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andrzej Bialecki resolved SOLR-15232.
-
Resolution: Won't Do

Closing as Won't Do (didn't know this was an option :) ). As noted in the PR 
comments this mechanism would be fragile, and there are better ways to do this 
in Kubernetes.

> Add replica(s) as a part of node startup
> 
>
> Key: SOLR-15232
> URL: https://issues.apache.org/jira/browse/SOLR-15232
> Project: Solr
>  Issue Type: Improvement
>Reporter: Andrzej Bialecki
>Assignee: Andrzej Bialecki
>Priority: Major
> Fix For: main (9.0)
>
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> In containerized environments it would make sense to be able to initialize a 
> new node (pod) and designate it immediately to hold newly created replica(s) 
> of specified collection/shard(s) once it's up and running.
> Currently this is not easy to do, it requires the intervention of an external 
> agent that additionally has to first check if the node is up, all of which 
> makes the process needlessly complicated.
> This functionality could be as simple as adding a command-line switch to 
> {{bin/solr start}}, which would cause it to invoke appropriate ADDREPLICA 
> commands once it verifies the node is up.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@solr.apache.org
For additional commands, e-mail: issues-h...@solr.apache.org



[jira] [Resolved] (SOLR-15300) Shard "state" flag is confusing and of limited value to outside consumers

2021-05-17 Thread Andrzej Bialecki (Jira)


 [ 
https://issues.apache.org/jira/browse/SOLR-15300?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andrzej Bialecki resolved SOLR-15300.
-
Fix Version/s: 8.9
   Resolution: Fixed

> Shard "state" flag is confusing and of limited value to outside consumers
> -
>
> Key: SOLR-15300
> URL: https://issues.apache.org/jira/browse/SOLR-15300
> Project: Solr
>  Issue Type: Improvement
>Reporter: Andrzej Bialecki
>Assignee: Andrzej Bialecki
>Priority: Major
> Fix For: 8.9
>
>  Time Spent: 1h 10m
>  Remaining Estimate: 0h
>
> Solr API (and consequently the metric reporters, which are often used for 
> Solr monitoring) report the shard as being in ACTIVE state even when in 
> reality its functionality is severely compromised (eg. no replicas, all 
> replicas down, or no leader).
> This reported state is technically correct because it is used only for 
> tracking of the SPLITSHARD operations, as defined in {{Slice.State}}. 
> However, this may be misleading and more often unhelpful than not - for 
> constant monitoring a flag that actually reports impaired functionality of a 
> shard would be more useful than a flag that reports a relatively uncommon 
> SPLITSHARD operation.
> We could either redefine the meaning of the existing flag (and change its 
> state according to some of the criteria I listed above), or add another flag 
> to represent the "health" status of a shard. The value of this flag would 
> then provide an easy way to monitor and to alert external systems of 
> dangerous function impairment, without monitoring the state of all replicas 
> of a collection.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@solr.apache.org
For additional commands, e-mail: issues-h...@solr.apache.org



[jira] [Commented] (SOLR-14245) Validate Replica / ReplicaInfo on creation

2021-05-18 Thread Andrzej Bialecki (Jira)


[ 
https://issues.apache.org/jira/browse/SOLR-14245?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17346758#comment-17346758
 ] 

Andrzej Bialecki commented on SOLR-14245:
-

I strongly disagree - let's not revert, instead fix the bug that caused the 
invalid state! {{Replica}} is a critical piece of information, if it's invalid 
then something seriously wrong already happened.

That's the whole point of validation, to quickly catch errors that can cause 
long-term subtle corruption.

> Validate Replica / ReplicaInfo on creation
> --
>
> Key: SOLR-14245
> URL: https://issues.apache.org/jira/browse/SOLR-14245
> Project: Solr
>  Issue Type: Improvement
>  Components: SolrCloud
>Reporter: Andrzej Bialecki
>Assignee: Andrzej Bialecki
>Priority: Minor
> Fix For: 8.5
>
>
> Replica / ReplicaInfo should be immutable and their fields should be 
> validated on creation.
> Some users reported that very rarely during a failed collection CREATE or 
> DELETE, or when the Overseer task queue becomes corrupted, Solr may write to 
> ZK incomplete replica infos (eg. node_name = null).
> This problem is difficult to reproduce but we should add safeguards anyway to 
> prevent writing such corrupted replica info to ZK.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@solr.apache.org
For additional commands, e-mail: issues-h...@solr.apache.org



[jira] [Comment Edited] (SOLR-14245) Validate Replica / ReplicaInfo on creation

2021-05-18 Thread Andrzej Bialecki (Jira)


[ 
https://issues.apache.org/jira/browse/SOLR-14245?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17346758#comment-17346758
 ] 

Andrzej Bialecki edited comment on SOLR-14245 at 5/18/21, 9:58 AM:
---

I strongly disagree - let's not revert, instead fix the bug that caused the 
invalid state! {{Replica}} is a critical piece of information, if it's invalid 
then something seriously wrong already happened.

That's the whole point of validation, to quickly catch errors that can cause 
long-term subtle corruption. If the validation logic is somehow faulty and 
there's an edge-case that it should accept, then we can fix it - but I'm 
against removing it.


was (Author: ab):
I strongly disagree - let's not revert, instead fix the bug that caused the 
invalid state! {{Replica}} is a critical piece of information, if it's invalid 
then something seriously wrong already happened.

That's the whole point of validation, to quickly catch errors that can cause 
long-term subtle corruption.

> Validate Replica / ReplicaInfo on creation
> --
>
> Key: SOLR-14245
> URL: https://issues.apache.org/jira/browse/SOLR-14245
> Project: Solr
>  Issue Type: Improvement
>  Components: SolrCloud
>Reporter: Andrzej Bialecki
>Assignee: Andrzej Bialecki
>Priority: Minor
> Fix For: 8.5
>
>
> Replica / ReplicaInfo should be immutable and their fields should be 
> validated on creation.
> Some users reported that very rarely during a failed collection CREATE or 
> DELETE, or when the Overseer task queue becomes corrupted, Solr may write to 
> ZK incomplete replica infos (eg. node_name = null).
> This problem is difficult to reproduce but we should add safeguards anyway to 
> prevent writing such corrupted replica info to ZK.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@solr.apache.org
For additional commands, e-mail: issues-h...@solr.apache.org



[jira] [Commented] (SOLR-14245) Validate Replica / ReplicaInfo on creation

2021-05-18 Thread Andrzej Bialecki (Jira)


[ 
https://issues.apache.org/jira/browse/SOLR-14245?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17346826#comment-17346826
 ] 

Andrzej Bialecki commented on SOLR-14245:
-

bq. how can deal with our attitude of not caring about users who may be 
affected by bugs or changes we introduce

[~ichattopadhyaya] and [~noble.paul]: I'm totally fed up with your ad hominem 
attacks. Please discuss this in a civil manner. If I were so inclined I could 
also point fingers to many areas of code where you both rammed through totally 
buggy and sloppy code, and start implying you're ignorant and careless. I could 
also find many significant changes you guys did without PRs or with a PR opened 
and committed within a couple hours, without any review.

But I hope that ultimately we all have good intentions and we should fight the 
problem and not each other. On second thought, I agree with Noble that the 
validation should be more lenient (incidentally, the bug that causes 
{{node_name: null}} is likely related to a buggy roundtrip conversion between 
ReplicaInfo <-> Replica... and guess who added ReplicaInfo?).

I still object to removing the validation completely, but we can make it 
non-fatal - as I said above, admins should be aware when Solr is using 
corrupted data, because we really can't be sure what other long-term 
consequences it may cause.

> Validate Replica / ReplicaInfo on creation
> --
>
> Key: SOLR-14245
> URL: https://issues.apache.org/jira/browse/SOLR-14245
> Project: Solr
>  Issue Type: Improvement
>  Components: SolrCloud
>Reporter: Andrzej Bialecki
>Assignee: Andrzej Bialecki
>Priority: Minor
> Fix For: 8.5
>
>
> Replica / ReplicaInfo should be immutable and their fields should be 
> validated on creation.
> Some users reported that very rarely during a failed collection CREATE or 
> DELETE, or when the Overseer task queue becomes corrupted, Solr may write to 
> ZK incomplete replica infos (eg. node_name = null).
> This problem is difficult to reproduce but we should add safeguards anyway to 
> prevent writing such corrupted replica info to ZK.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@solr.apache.org
For additional commands, e-mail: issues-h...@solr.apache.org



[jira] [Commented] (SOLR-14245) Validate Replica / ReplicaInfo on creation

2021-05-18 Thread Andrzej Bialecki (Jira)


[ 
https://issues.apache.org/jira/browse/SOLR-14245?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17346834#comment-17346834
 ] 

Andrzej Bialecki commented on SOLR-14245:
-

Jira is for tracking technical issues, and not discussing our attitudes, 
whether real or implied. "We are callous" is a blank statement to which I don't 
subscribe.

This issue is more than a year old - I think that instead of reopening it a 
separate Jira should be created to discuss the proper fix. I'm unwilling to 
simply revert it because (as I explained above) the purpose of this change is 
still valid - it's important to be aware that a piece of a critical Solr state 
is corrupted. Let's open a new Jira and discuss the fix.

> Validate Replica / ReplicaInfo on creation
> --
>
> Key: SOLR-14245
> URL: https://issues.apache.org/jira/browse/SOLR-14245
> Project: Solr
>  Issue Type: Improvement
>  Components: SolrCloud
>Reporter: Andrzej Bialecki
>Assignee: Andrzej Bialecki
>Priority: Minor
> Fix For: 8.5
>
>
> Replica / ReplicaInfo should be immutable and their fields should be 
> validated on creation.
> Some users reported that very rarely during a failed collection CREATE or 
> DELETE, or when the Overseer task queue becomes corrupted, Solr may write to 
> ZK incomplete replica infos (eg. node_name = null).
> This problem is difficult to reproduce but we should add safeguards anyway to 
> prevent writing such corrupted replica info to ZK.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@solr.apache.org
For additional commands, e-mail: issues-h...@solr.apache.org



[jira] [Resolved] (SOLR-14245) Validate Replica / ReplicaInfo on creation

2021-05-18 Thread Andrzej Bialecki (Jira)


 [ 
https://issues.apache.org/jira/browse/SOLR-14245?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andrzej Bialecki resolved SOLR-14245.
-

I'm closing this issue. Please create a separate Jira to discuss the bug and 
the fix - this issue is already 3 releases, and 15 months old.

> Validate Replica / ReplicaInfo on creation
> --
>
> Key: SOLR-14245
> URL: https://issues.apache.org/jira/browse/SOLR-14245
> Project: Solr
>  Issue Type: Improvement
>  Components: SolrCloud
>Reporter: Andrzej Bialecki
>Assignee: Andrzej Bialecki
>Priority: Minor
> Fix For: 8.5
>
>
> Replica / ReplicaInfo should be immutable and their fields should be 
> validated on creation.
> Some users reported that very rarely during a failed collection CREATE or 
> DELETE, or when the Overseer task queue becomes corrupted, Solr may write to 
> ZK incomplete replica infos (eg. node_name = null).
> This problem is difficult to reproduce but we should add safeguards anyway to 
> prevent writing such corrupted replica info to ZK.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@solr.apache.org
For additional commands, e-mail: issues-h...@solr.apache.org



[jira] [Commented] (SOLR-15348) revisit MetricsHistoryHandler's "could not obtain overseer" WARNings

2021-05-18 Thread Andrzej Bialecki (Jira)


[ 
https://issues.apache.org/jira/browse/SOLR-15348?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17347091#comment-17347091
 ] 

Andrzej Bialecki commented on SOLR-15348:
-

With the removal of autoscaling the usefulness of this handler is questionable 
- perhaps we should simply remove it (and the whole metrics history collection 
in Solr). I'll create a separate Jira for this.

> revisit MetricsHistoryHandler's "could not obtain overseer" WARNings
> 
>
> Key: SOLR-15348
> URL: https://issues.apache.org/jira/browse/SOLR-15348
> Project: Solr
>  Issue Type: Task
>Reporter: Christine Poerschke
>Assignee: Christine Poerschke
>Priority: Minor
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> https://github.com/apache/lucene-solr/blob/releases/lucene-solr/8.8.2/solr/core/src/java/org/apache/solr/handler/admin/MetricsHistoryHandler.java#L339



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@solr.apache.org
For additional commands, e-mail: issues-h...@solr.apache.org



[jira] [Created] (SOLR-15416) Consider removing metrics history collection (and MetricsHistoryHandler)

2021-05-18 Thread Andrzej Bialecki (Jira)
Andrzej Bialecki created SOLR-15416:
---

 Summary: Consider removing metrics history collection (and 
MetricsHistoryHandler)
 Key: SOLR-15416
 URL: https://issues.apache.org/jira/browse/SOLR-15416
 Project: Solr
  Issue Type: Improvement
  Security Level: Public (Default Security Level. Issues are Public)
  Components: metrics
Reporter: Andrzej Bialecki
Assignee: Andrzej Bialecki


Originally this functionality was meant to one day support more intelligent 
decisions in the autoscaling triggers that would react to the dynamics of the 
metrics changes. For this reason it was useful to keep track of the changes in 
the key metrics over time, without depending on any external systems.

With the removal of autoscaling the usefulness of this handler (and the 
collection of metrics history inside Solr) is questionable.

I propose to remove it in 9.0.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@solr.apache.org
For additional commands, e-mail: issues-h...@solr.apache.org



[jira] [Comment Edited] (SOLR-15348) revisit MetricsHistoryHandler's "could not obtain overseer" WARNings

2021-05-18 Thread Andrzej Bialecki (Jira)


[ 
https://issues.apache.org/jira/browse/SOLR-15348?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17347091#comment-17347091
 ] 

Andrzej Bialecki edited comment on SOLR-15348 at 5/18/21, 5:48 PM:
---

With the removal of autoscaling the usefulness of this handler is questionable 
- perhaps we should simply remove it (and the whole metrics history collection 
in Solr). I'll create a separate Jira for this.

Edit: SOLR-15416


was (Author: ab):
With the removal of autoscaling the usefulness of this handler is questionable 
- perhaps we should simply remove it (and the whole metrics history collection 
in Solr). I'll create a separate Jira for this.

> revisit MetricsHistoryHandler's "could not obtain overseer" WARNings
> 
>
> Key: SOLR-15348
> URL: https://issues.apache.org/jira/browse/SOLR-15348
> Project: Solr
>  Issue Type: Task
>Reporter: Christine Poerschke
>Assignee: Christine Poerschke
>Priority: Minor
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> https://github.com/apache/lucene-solr/blob/releases/lucene-solr/8.8.2/solr/core/src/java/org/apache/solr/handler/admin/MetricsHistoryHandler.java#L339



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@solr.apache.org
For additional commands, e-mail: issues-h...@solr.apache.org



[jira] [Updated] (SOLR-15416) Consider removing metrics history collection (and MetricsHistoryHandler)

2021-05-19 Thread Andrzej Bialecki (Jira)


 [ 
https://issues.apache.org/jira/browse/SOLR-15416?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andrzej Bialecki updated SOLR-15416:

Fix Version/s: main (9.0)

> Consider removing metrics history collection (and MetricsHistoryHandler)
> 
>
> Key: SOLR-15416
> URL: https://issues.apache.org/jira/browse/SOLR-15416
> Project: Solr
>  Issue Type: Improvement
>  Security Level: Public(Default Security Level. Issues are Public) 
>  Components: metrics
>Reporter: Andrzej Bialecki
>Assignee: Andrzej Bialecki
>Priority: Major
> Fix For: main (9.0)
>
>
> Originally this functionality was meant to one day support more intelligent 
> decisions in the autoscaling triggers that would react to the dynamics of the 
> metrics changes. For this reason it was useful to keep track of the changes 
> in the key metrics over time, without depending on any external systems.
> With the removal of autoscaling the usefulness of this handler (and the 
> collection of metrics history inside Solr) is questionable.
> I propose to remove it in 9.0.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@solr.apache.org
For additional commands, e-mail: issues-h...@solr.apache.org



[jira] [Updated] (SOLR-15416) Consider removing metrics history collection (and MetricsHistoryHandler)

2021-05-20 Thread Andrzej Bialecki (Jira)


 [ 
https://issues.apache.org/jira/browse/SOLR-15416?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andrzej Bialecki updated SOLR-15416:

Attachment: SOLR-15416.patch

> Consider removing metrics history collection (and MetricsHistoryHandler)
> 
>
> Key: SOLR-15416
> URL: https://issues.apache.org/jira/browse/SOLR-15416
> Project: Solr
>  Issue Type: Improvement
>  Security Level: Public(Default Security Level. Issues are Public) 
>  Components: metrics
>Reporter: Andrzej Bialecki
>Assignee: Andrzej Bialecki
>Priority: Major
> Fix For: main (9.0)
>
> Attachments: SOLR-15416.patch
>
>
> Originally this functionality was meant to one day support more intelligent 
> decisions in the autoscaling triggers that would react to the dynamics of the 
> metrics changes. For this reason it was useful to keep track of the changes 
> in the key metrics over time, without depending on any external systems.
> With the removal of autoscaling the usefulness of this handler (and the 
> collection of metrics history inside Solr) is questionable.
> I propose to remove it in 9.0.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@solr.apache.org
For additional commands, e-mail: issues-h...@solr.apache.org



[jira] [Commented] (SOLR-15416) Consider removing metrics history collection (and MetricsHistoryHandler)

2021-05-20 Thread Andrzej Bialecki (Jira)


[ 
https://issues.apache.org/jira/browse/SOLR-15416?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17348578#comment-17348578
 ] 

Andrzej Bialecki commented on SOLR-15416:
-

This patch removes MetricsHistoryHandler, MetricsCollectorHandler, SolrCluster 
/ SolrShardReporter and support for Solr-backed RRD database (with rrd4j 
dependency).

If there are no objections I'll commit this shortly.

> Consider removing metrics history collection (and MetricsHistoryHandler)
> 
>
> Key: SOLR-15416
> URL: https://issues.apache.org/jira/browse/SOLR-15416
> Project: Solr
>  Issue Type: Improvement
>  Security Level: Public(Default Security Level. Issues are Public) 
>  Components: metrics
>Reporter: Andrzej Bialecki
>Assignee: Andrzej Bialecki
>Priority: Major
> Fix For: main (9.0)
>
> Attachments: SOLR-15416.patch
>
>
> Originally this functionality was meant to one day support more intelligent 
> decisions in the autoscaling triggers that would react to the dynamics of the 
> metrics changes. For this reason it was useful to keep track of the changes 
> in the key metrics over time, without depending on any external systems.
> With the removal of autoscaling the usefulness of this handler (and the 
> collection of metrics history inside Solr) is questionable.
> I propose to remove it in 9.0.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@solr.apache.org
For additional commands, e-mail: issues-h...@solr.apache.org



[jira] [Updated] (SOLR-15416) Remove metrics history collection (and MetricsHistoryHandler)

2021-05-20 Thread Andrzej Bialecki (Jira)


 [ 
https://issues.apache.org/jira/browse/SOLR-15416?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andrzej Bialecki updated SOLR-15416:

Summary: Remove metrics history collection (and MetricsHistoryHandler)  
(was: Consider removing metrics history collection (and MetricsHistoryHandler))

> Remove metrics history collection (and MetricsHistoryHandler)
> -
>
> Key: SOLR-15416
> URL: https://issues.apache.org/jira/browse/SOLR-15416
> Project: Solr
>  Issue Type: Improvement
>  Security Level: Public(Default Security Level. Issues are Public) 
>  Components: metrics
>Reporter: Andrzej Bialecki
>Assignee: Andrzej Bialecki
>Priority: Major
> Fix For: main (9.0)
>
> Attachments: SOLR-15416.patch
>
>
> Originally this functionality was meant to one day support more intelligent 
> decisions in the autoscaling triggers that would react to the dynamics of the 
> metrics changes. For this reason it was useful to keep track of the changes 
> in the key metrics over time, without depending on any external systems.
> With the removal of autoscaling the usefulness of this handler (and the 
> collection of metrics history inside Solr) is questionable.
> I propose to remove it in 9.0.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@solr.apache.org
For additional commands, e-mail: issues-h...@solr.apache.org



[jira] [Created] (SOLR-15425) Upgrade to Metrics 4.2.0

2021-05-20 Thread Andrzej Bialecki (Jira)
Andrzej Bialecki created SOLR-15425:
---

 Summary: Upgrade to Metrics 4.2.0
 Key: SOLR-15425
 URL: https://issues.apache.org/jira/browse/SOLR-15425
 Project: Solr
  Issue Type: Improvement
  Security Level: Public (Default Security Level. Issues are Public)
  Components: metrics
Reporter: Andrzej Bialecki
Assignee: Andrzej Bialecki


In addition to many fixes and compatibility with new Java versions this release 
adds a {{LockFreeExponentiallyDecayingReservoir}} which substantially reduces 
the cost of collecting histograms, especially for multi-threaded updates.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@solr.apache.org
For additional commands, e-mail: issues-h...@solr.apache.org



[jira] [Updated] (SOLR-15425) Upgrade to Metrics 4.2.0

2021-05-20 Thread Andrzej Bialecki (Jira)


 [ 
https://issues.apache.org/jira/browse/SOLR-15425?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andrzej Bialecki updated SOLR-15425:

Description: In addition to many fixes and compatibility with new Java 
versions this release adds a {{LockFreeExponentiallyDecayingReservoir}} which 
substantially reduces the cost of collecting histograms, especially for 
multi-threaded updates. It provides also a no-op implementation of 
MetricRegistry, which would further reduce the already small overheads when 
metrics collection is turned off.  (was: In addition to many fixes and 
compatibility with new Java versions this release adds a 
{{LockFreeExponentiallyDecayingReservoir}} which substantially reduces the cost 
of collecting histograms, especially for multi-threaded updates.)

> Upgrade to Metrics 4.2.0
> 
>
> Key: SOLR-15425
> URL: https://issues.apache.org/jira/browse/SOLR-15425
> Project: Solr
>  Issue Type: Improvement
>  Security Level: Public(Default Security Level. Issues are Public) 
>  Components: metrics
>Reporter: Andrzej Bialecki
>Assignee: Andrzej Bialecki
>Priority: Major
>
> In addition to many fixes and compatibility with new Java versions this 
> release adds a {{LockFreeExponentiallyDecayingReservoir}} which substantially 
> reduces the cost of collecting histograms, especially for multi-threaded 
> updates. It provides also a no-op implementation of MetricRegistry, which 
> would further reduce the already small overheads when metrics collection is 
> turned off.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@solr.apache.org
For additional commands, e-mail: issues-h...@solr.apache.org



[jira] [Resolved] (SOLR-15379) Fix API incompatibility after LUCENE-9905

2021-05-20 Thread Andrzej Bialecki (Jira)


 [ 
https://issues.apache.org/jira/browse/SOLR-15379?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andrzej Bialecki resolved SOLR-15379.
-
Fix Version/s: main (9.0)
   Resolution: Fixed

> Fix API incompatibility after LUCENE-9905
> -
>
> Key: SOLR-15379
> URL: https://issues.apache.org/jira/browse/SOLR-15379
> Project: Solr
>  Issue Type: Bug
>  Security Level: Public(Default Security Level. Issues are Public) 
>Reporter: Andrzej Bialecki
>Assignee: Andrzej Bialecki
>Priority: Minor
> Fix For: main (9.0)
>
> Attachments: SOLR-15379.patch
>
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@solr.apache.org
For additional commands, e-mail: issues-h...@solr.apache.org



[jira] [Resolved] (SOLR-14749) Provide a clean API for cluster-level event processing

2021-05-20 Thread Andrzej Bialecki (Jira)


 [ 
https://issues.apache.org/jira/browse/SOLR-14749?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andrzej Bialecki resolved SOLR-14749.
-
Resolution: Fixed

> Provide a clean API for cluster-level event processing
> --
>
> Key: SOLR-14749
> URL: https://issues.apache.org/jira/browse/SOLR-14749
> Project: Solr
>  Issue Type: Improvement
>  Components: AutoScaling
>Reporter: Andrzej Bialecki
>Assignee: Andrzej Bialecki
>Priority: Major
>  Labels: clean-api
> Fix For: main (9.0)
>
>  Time Spent: 22h
>  Remaining Estimate: 0h
>
> This is a companion issue to SOLR-14613 and it aims at providing a clean, 
> strongly typed API for the functionality formerly known as "triggers" - that 
> is, a component for generating cluster-level events corresponding to changes 
> in the cluster state, and a pluggable API for processing these events.
> The 8x triggers have been removed so this functionality is currently missing 
> in 9.0. However, this functionality is crucial for implementing the automatic 
> collection repair and re-balancing as the cluster state changes (nodes going 
> down / up, becoming overloaded / unused / decommissioned, etc).
> For this reason we need this API and a default implementation of triggers 
> that at least can perform automatic collection repair (maintaining the 
> desired replication factor in presence of live node changes).
> As before, the actual changes to the collections will be executed using 
> existing CollectionAdmin API, which in turn may use the placement plugins 
> from SOLR-14613.
> h3. Division of responsibility
>  * built-in Solr components (non-pluggable):
>  ** cluster state monitoring and event generation,
>  ** simple scheduler to periodically generate scheduled events
>  * plugins:
>  ** automatic collection repair on {{nodeLost}} events (provided by default)
>  ** re-balancing of replicas (periodic or on {{nodeAdded}} events)
>  ** reporting (eg. requesting additional node provisioning)
>  ** scheduled maintenance (eg. removing inactive shards after split)
> h3. Other considerations
> These plugins (unlike the placement plugins) need to execute on one 
> designated node in the cluster. Currently the easiest way to implement this 
> is to run them on the Overseer leader node.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@solr.apache.org
For additional commands, e-mail: issues-h...@solr.apache.org



[jira] [Commented] (SOLR-15416) Remove metrics history collection (and MetricsHistoryHandler)

2021-05-20 Thread Andrzej Bialecki (Jira)


[ 
https://issues.apache.org/jira/browse/SOLR-15416?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17348805#comment-17348805
 ] 

Andrzej Bialecki commented on SOLR-15416:
-

Good point - yes, we need to deprecate it. I'll prepare a patch for this too.

> Remove metrics history collection (and MetricsHistoryHandler)
> -
>
> Key: SOLR-15416
> URL: https://issues.apache.org/jira/browse/SOLR-15416
> Project: Solr
>  Issue Type: Improvement
>  Security Level: Public(Default Security Level. Issues are Public) 
>  Components: metrics
>Reporter: Andrzej Bialecki
>Assignee: Andrzej Bialecki
>Priority: Major
> Fix For: main (9.0)
>
> Attachments: SOLR-15416.patch
>
>
> Originally this functionality was meant to one day support more intelligent 
> decisions in the autoscaling triggers that would react to the dynamics of the 
> metrics changes. For this reason it was useful to keep track of the changes 
> in the key metrics over time, without depending on any external systems.
> With the removal of autoscaling the usefulness of this handler (and the 
> collection of metrics history inside Solr) is questionable.
> I propose to remove it in 9.0.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@solr.apache.org
For additional commands, e-mail: issues-h...@solr.apache.org



[jira] [Commented] (SOLR-15428) Integrate the OpenJDK JMH micro benchmark framework for micro benchmarks and performance comparisons and investigation.

2021-05-22 Thread Andrzej Bialecki (Jira)


[ 
https://issues.apache.org/jira/browse/SOLR-15428?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17349635#comment-17349635
 ] 

Andrzej Bialecki commented on SOLR-15428:
-

+1! Excellent idea.

> Integrate the OpenJDK JMH micro benchmark framework for micro benchmarks and 
> performance comparisons and investigation.
> ---
>
> Key: SOLR-15428
> URL: https://issues.apache.org/jira/browse/SOLR-15428
> Project: Solr
>  Issue Type: New Feature
>  Security Level: Public(Default Security Level. Issues are Public) 
>Reporter: Mark Robert Miller
>Priority: Major
>
> I’ve spent a fair amount of time over the years on work around integrating 
> Lucene’s benchmark framework into Solr and while I’ve used this with 
> additional local work off and on, JMH has become somewhat of a standard for 
> micro benchmarks on the JVM. I have some work that provides an initial 
> integration, allowing for more targeted micro benchmarks as well as more 
> integration type benchmarking using JettySolrRunner. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@solr.apache.org
For additional commands, e-mail: issues-h...@solr.apache.org



[jira] [Updated] (SOLR-15416) Remove metrics history collection (and MetricsHistoryHandler)

2021-05-24 Thread Andrzej Bialecki (Jira)


 [ 
https://issues.apache.org/jira/browse/SOLR-15416?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andrzej Bialecki updated SOLR-15416:

Attachment: SOLR-15416-8x.patch

> Remove metrics history collection (and MetricsHistoryHandler)
> -
>
> Key: SOLR-15416
> URL: https://issues.apache.org/jira/browse/SOLR-15416
> Project: Solr
>  Issue Type: Improvement
>  Security Level: Public(Default Security Level. Issues are Public) 
>  Components: metrics
>Reporter: Andrzej Bialecki
>Assignee: Andrzej Bialecki
>Priority: Major
> Fix For: main (9.0)
>
> Attachments: SOLR-15416-8x.patch, SOLR-15416.patch
>
>
> Originally this functionality was meant to one day support more intelligent 
> decisions in the autoscaling triggers that would react to the dynamics of the 
> metrics changes. For this reason it was useful to keep track of the changes 
> in the key metrics over time, without depending on any external systems.
> With the removal of autoscaling the usefulness of this handler (and the 
> collection of metrics history inside Solr) is questionable.
> I propose to remove it in 9.0.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@solr.apache.org
For additional commands, e-mail: issues-h...@solr.apache.org



[jira] [Commented] (SOLR-15416) Remove metrics history collection (and MetricsHistoryHandler)

2021-05-24 Thread Andrzej Bialecki (Jira)


[ 
https://issues.apache.org/jira/browse/SOLR-15416?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17350424#comment-17350424
 ] 

Andrzej Bialecki commented on SOLR-15416:
-

The other patch contains deprecations for 8x. I'll commit this shortly.

> Remove metrics history collection (and MetricsHistoryHandler)
> -
>
> Key: SOLR-15416
> URL: https://issues.apache.org/jira/browse/SOLR-15416
> Project: Solr
>  Issue Type: Improvement
>  Security Level: Public(Default Security Level. Issues are Public) 
>  Components: metrics
>Reporter: Andrzej Bialecki
>Assignee: Andrzej Bialecki
>Priority: Major
> Fix For: main (9.0)
>
> Attachments: SOLR-15416-8x.patch, SOLR-15416.patch
>
>
> Originally this functionality was meant to one day support more intelligent 
> decisions in the autoscaling triggers that would react to the dynamics of the 
> metrics changes. For this reason it was useful to keep track of the changes 
> in the key metrics over time, without depending on any external systems.
> With the removal of autoscaling the usefulness of this handler (and the 
> collection of metrics history inside Solr) is questionable.
> I propose to remove it in 9.0.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@solr.apache.org
For additional commands, e-mail: issues-h...@solr.apache.org



[jira] [Resolved] (SOLR-15416) Remove metrics history collection (and MetricsHistoryHandler)

2021-05-24 Thread Andrzej Bialecki (Jira)


 [ 
https://issues.apache.org/jira/browse/SOLR-15416?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andrzej Bialecki resolved SOLR-15416.
-
Resolution: Fixed

> Remove metrics history collection (and MetricsHistoryHandler)
> -
>
> Key: SOLR-15416
> URL: https://issues.apache.org/jira/browse/SOLR-15416
> Project: Solr
>  Issue Type: Improvement
>  Security Level: Public(Default Security Level. Issues are Public) 
>  Components: metrics
>Reporter: Andrzej Bialecki
>Assignee: Andrzej Bialecki
>Priority: Major
> Fix For: main (9.0)
>
> Attachments: SOLR-15416-8x.patch, SOLR-15416.patch
>
>
> Originally this functionality was meant to one day support more intelligent 
> decisions in the autoscaling triggers that would react to the dynamics of the 
> metrics changes. For this reason it was useful to keep track of the changes 
> in the key metrics over time, without depending on any external systems.
> With the removal of autoscaling the usefulness of this handler (and the 
> collection of metrics history inside Solr) is questionable.
> I propose to remove it in 9.0.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@solr.apache.org
For additional commands, e-mail: issues-h...@solr.apache.org



[jira] [Commented] (SOLR-11882) SolrMetric registries retain references to SolrCores when closed

2021-05-26 Thread Andrzej Bialecki (Jira)


[ 
https://issues.apache.org/jira/browse/SOLR-11882?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17351771#comment-17351771
 ] 

Andrzej Bialecki commented on SOLR-11882:
-

[~dsmiley] I think we can do even better - move all the logic related to 
metrics to the CoreContainer.load(). After all, if we fail to init CC the node 
and its metrics are unusable anyway. And when we close the CoreContainer the 
metrics are not available either, so we can equally well do the cleanup in 
CC.shutdown().

> SolrMetric registries retain references to SolrCores when closed
> 
>
> Key: SOLR-11882
> URL: https://issues.apache.org/jira/browse/SOLR-11882
> Project: Solr
>  Issue Type: Bug
>  Components: metrics, Server
>Affects Versions: 7.1
>Reporter: Eros Taborelli
>Assignee: Andrzej Bialecki
>Priority: Major
> Fix For: 7.4, 8.0
>
> Attachments: SOLR-11882-7x.patch, SOLR-11882.patch, SOLR-11882.patch, 
> SOLR-11882.patch, SOLR-11882.patch, SOLR-11882.patch, SOLR-11882.patch, 
> create-cores.zip, solr-dump-full_Leak_Suspects.zip, solr.config.zip
>
>
> *Description:*
> Our setup involves using a lot of small cores (possibly hundred thousand), 
> but working only on a few of them at any given time.
> We already followed all recommendations in this guide: 
> [https://wiki.apache.org/solr/LotsOfCores]
> We noticed that after creating/loading around 1000-2000 empty cores, with no 
> documents inside, the heap consumption went through the roof despite having 
> set transientCacheSize to only 64 (heap size set to 12G).
> All cores are correctly set to loadOnStartup=false and transient=true, and we 
> have verified via logs that the cores in excess are actually being closed.
> However, a reference remains in the 
> org.apache.solr.metrics.SolrMetricManager#registries that is never removed 
> until a core if fully unloaded.
> Restarting the JVM loads all cores in the admin UI, but doesn't populate the 
> ConcurrentHashMap until a core is actually fully loaded.
> I reproduced the issue on a smaller scale (transientCacheSize = 5, heap size 
> = 512m) and made a report (attached) using eclipse MAT.
> *Desired outcome:*
> When a transient core is closed, the references in the SolrMetricManager 
> should be removed, in the same fashion the reporters for the core are also 
> closed and removed.
> In alternative, a unloadOnClose=true|false flag could be implemented to fully 
> unload a transient core when closed due to the cache size.
> *Note:*
> The documentation mentions everywhere that the unused cores will be unloaded, 
> but it's misleading as the cores are never fully unloaded.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@solr.apache.org
For additional commands, e-mail: issues-h...@solr.apache.org



[jira] [Created] (SOLR-15858) ConfigSetsHandler requires DIR entries in the uploaded ZIPs

2021-12-20 Thread Andrzej Bialecki (Jira)
Andrzej Bialecki created SOLR-15858:
---

 Summary: ConfigSetsHandler requires DIR entries in the uploaded 
ZIPs
 Key: SOLR-15858
 URL: https://issues.apache.org/jira/browse/SOLR-15858
 Project: Solr
  Issue Type: Bug
  Security Level: Public (Default Security Level. Issues are Public)
  Components: configset-api
Affects Versions: 8.11.1
Reporter: Andrzej Bialecki


If you try uploading a configset zip that contains resources in sub-folders - 
but doesn't contain explicit DIR entries in the zip file - the upload will fail 
with `NoNodeException`.

This is caused by `ConfigSetsHandler.createZkNodeIfNotExistsAndSetData` which 
assumes the entry path doesn't contain sub-path elements. If the corresponding 
DIR entries are present (and they occur earlier in the zip than their child 
resource entries!) the handler will work properly because it recognizes DIR 
entries and creates ZK paths as needed.

The fix would be to always check for the presence of `/` characters in the 
entry name and make sure the ZK path already exists.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@solr.apache.org
For additional commands, e-mail: issues-h...@solr.apache.org



[jira] [Updated] (SOLR-15858) ConfigSetsHandler requires DIR entries in the uploaded ZIPs

2021-12-20 Thread Andrzej Bialecki (Jira)


 [ 
https://issues.apache.org/jira/browse/SOLR-15858?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andrzej Bialecki updated SOLR-15858:

Description: 
If you try uploading a configset zip that contains resources in sub-folders - 
but doesn't contain explicit DIR entries in the zip file - the upload will fail 
with {{{}NoNodeException{}}}.

This is caused by {{ConfigSetsHandler.createZkNodeIfNotExistsAndSetData}} which 
assumes the entry path doesn't contain sub-path elements. If the corresponding 
DIR entries are present (and they occur earlier in the zip than their child 
resource entries!) the handler will work properly because it recognizes DIR 
entries and creates ZK paths as needed.

The fix would be to always check for the presence of `/` characters in the 
entry name and make sure the ZK path already exists.

  was:
If you try uploading a configset zip that contains resources in sub-folders - 
but doesn't contain explicit DIR entries in the zip file - the upload will fail 
with `NoNodeException`.

This is caused by `ConfigSetsHandler.createZkNodeIfNotExistsAndSetData` which 
assumes the entry path doesn't contain sub-path elements. If the corresponding 
DIR entries are present (and they occur earlier in the zip than their child 
resource entries!) the handler will work properly because it recognizes DIR 
entries and creates ZK paths as needed.

The fix would be to always check for the presence of `/` characters in the 
entry name and make sure the ZK path already exists.


> ConfigSetsHandler requires DIR entries in the uploaded ZIPs
> ---
>
> Key: SOLR-15858
> URL: https://issues.apache.org/jira/browse/SOLR-15858
> Project: Solr
>  Issue Type: Bug
>  Security Level: Public(Default Security Level. Issues are Public) 
>  Components: configset-api
>Affects Versions: 8.11.1
>Reporter: Andrzej Bialecki
>Priority: Major
>
> If you try uploading a configset zip that contains resources in sub-folders - 
> but doesn't contain explicit DIR entries in the zip file - the upload will 
> fail with {{{}NoNodeException{}}}.
> This is caused by {{ConfigSetsHandler.createZkNodeIfNotExistsAndSetData}} 
> which assumes the entry path doesn't contain sub-path elements. If the 
> corresponding DIR entries are present (and they occur earlier in the zip than 
> their child resource entries!) the handler will work properly because it 
> recognizes DIR entries and creates ZK paths as needed.
> The fix would be to always check for the presence of `/` characters in the 
> entry name and make sure the ZK path already exists.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@solr.apache.org
For additional commands, e-mail: issues-h...@solr.apache.org



[jira] [Commented] (SOLR-16013) Overseer gives up election node before closing - inflight commands can be processed twice

2022-02-16 Thread Andrzej Bialecki (Jira)


[ 
https://issues.apache.org/jira/browse/SOLR-16013?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17493279#comment-17493279
 ] 

Andrzej Bialecki commented on SOLR-16013:
-

Additionally, `OverseerElectionContext.close()` has this implementation:
{code:java}
@Override
public synchronized void close() {
  this.isClosed = true;
  overseer.close();
} {code}
So it marks itself as closed before the Overseer is closed, and I agree that it 
seems to me it should do it the other way around, and then simply check in 
`runLeaderProcess:76` if the Overseer is not closed.

> Overseer gives up election node before closing - inflight commands can be 
> processed twice
> -
>
> Key: SOLR-16013
> URL: https://issues.apache.org/jira/browse/SOLR-16013
> Project: Solr
>  Issue Type: Bug
>  Security Level: Public(Default Security Level. Issues are Public) 
>Reporter: Chris M. Hostetter
>Priority: Major
>
> {{ZkController}} shutdown currently has these two lines (in this order)...
> {code:java}
> customThreadPool.submit(() -> 
> IOUtils.closeQuietly(overseerElector.getContext()));
> customThreadPool.submit(() -> IOUtils.closeQuietly(overseer));
> {code}
> AFAICT this means that means that the overseer nodeX will give up it's 
> election node (via overseerElector) allowing some other nodeY to be elected a 
> new overseer, **BEFORE** Overseer nodeX shuts down it's {{Overseer}} object, 
> which waits for the {{OverseerThread}} to finish processing any tasks in 
> process.
> In practice, this seems to make it possible for a single command in the 
> overseer queue to get processed twice.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@solr.apache.org
For additional commands, e-mail: issues-h...@solr.apache.org



[jira] [Comment Edited] (SOLR-16013) Overseer gives up election node before closing - inflight commands can be processed twice

2022-02-16 Thread Andrzej Bialecki (Jira)


[ 
https://issues.apache.org/jira/browse/SOLR-16013?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17493279#comment-17493279
 ] 

Andrzej Bialecki edited comment on SOLR-16013 at 2/16/22, 3:30 PM:
---

Additionally, `OverseerElectionContext.close()` has this implementation:
{code:java}
@Override
public synchronized void close() {
  this.isClosed = true;
  overseer.close();
} {code}
So it marks itself as closed before the Overseer is closed, and I agree that it 
seems to me it should do it the other way around, and then simply check in 
`runLeaderProcess:76` if the Overseer is not closed.

Edit: I think the idea in `OverseerElectionContext` was to primarily avoid 
re-electing this Overseer and then wait until all its tasks are completed. But 
this allows other overseer to be elected and keep processing the in-flight 
tasks as new.


was (Author: ab):
Additionally, `OverseerElectionContext.close()` has this implementation:
{code:java}
@Override
public synchronized void close() {
  this.isClosed = true;
  overseer.close();
} {code}
So it marks itself as closed before the Overseer is closed, and I agree that it 
seems to me it should do it the other way around, and then simply check in 
`runLeaderProcess:76` if the Overseer is not closed.

> Overseer gives up election node before closing - inflight commands can be 
> processed twice
> -
>
> Key: SOLR-16013
> URL: https://issues.apache.org/jira/browse/SOLR-16013
> Project: Solr
>  Issue Type: Bug
>  Security Level: Public(Default Security Level. Issues are Public) 
>Reporter: Chris M. Hostetter
>Priority: Major
>
> {{ZkController}} shutdown currently has these two lines (in this order)...
> {code:java}
> customThreadPool.submit(() -> 
> IOUtils.closeQuietly(overseerElector.getContext()));
> customThreadPool.submit(() -> IOUtils.closeQuietly(overseer));
> {code}
> AFAICT this means that means that the overseer nodeX will give up it's 
> election node (via overseerElector) allowing some other nodeY to be elected a 
> new overseer, **BEFORE** Overseer nodeX shuts down it's {{Overseer}} object, 
> which waits for the {{OverseerThread}} to finish processing any tasks in 
> process.
> In practice, this seems to make it possible for a single command in the 
> overseer queue to get processed twice.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@solr.apache.org
For additional commands, e-mail: issues-h...@solr.apache.org



[jira] [Commented] (SOLR-16073) totalTime metric should be milliseconds (not nano)

2022-03-09 Thread Andrzej Bialecki (Jira)


[ 
https://issues.apache.org/jira/browse/SOLR-16073?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17503586#comment-17503586
 ] 

Andrzej Bialecki commented on SOLR-16073:
-

Removing the conversion may have been a mistake, we should consistently report 
time intervals using the same units - currently we report the intervals inside 
histograms in milliseconds, and the elapsed times of Timers we report in 
nanoseconds.

Changing the units may have some back-compat consequences, not sure how to 
address them. Also, I can't say whether this metric is useful to be included by 
default in the exporter - generally speaking, since exporting the metrics via 
Prometheus exporter is a relatively heavyweight process IMHO we should attempt 
to cut down the number of exported metrics to a bare minimum (whatever that 
means ;) ).

> totalTime metric should be milliseconds (not nano)
> --
>
> Key: SOLR-16073
> URL: https://issues.apache.org/jira/browse/SOLR-16073
> Project: Solr
>  Issue Type: Bug
>  Security Level: Public(Default Security Level. Issues are Public) 
>  Components: metrics
>Reporter: David Smiley
>Priority: Minor
>
> I observed that the "totalTime" metric has been a nanosecond number in recent 
> years, yet once upon a time it was milliseconds. This change was very likely 
> inadvertent. Our prometheus solr-exporter-config.xml shows that it thinks 
> it's milliseconds. It's not; RequestHandlerBase increments this counter by 
> "elapsed", the response of timer.stop() -- nanoseconds. Years ago it had 
> invoked {{MetricUtils.nsToMs}} but it appears [~ab] removed this as a part of 
> other changes in 2017 sometime -- 
> https://github.com/apache/solr/commit/d8df9f8c9963c2fc1718fd471316bf5d964125ba
> Also, I question the value/purpose of this metric.  Is it so useful that it 
> deserves to be among our relatively few metrics exported in our default 
> prometheus exporter config?  It's been there since the initial config but I 
> wonder why anyone wants it.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@solr.apache.org
For additional commands, e-mail: issues-h...@solr.apache.org



[jira] [Commented] (SOLR-15502) MetricsCollectorHandler deprecated warning (missing documentation)

2021-07-05 Thread Andrzej Bialecki (Jira)


[ 
https://issues.apache.org/jira/browse/SOLR-15502?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17374793#comment-17374793
 ] 

Andrzej Bialecki commented on SOLR-15502:
-

Yes, we can remove the annotation (which will avoid the warning in the logs), 
it should be enough to keep the javadoc @deprecated tag (and the RefGuide 
notice). I'll fix this in 8x.

[~bwahlen] as Cassandra said, it's safe to ignore this warning - the warning 
will be removed in the next 8.x release, and the component is gone in 9.x.

> MetricsCollectorHandler deprecated warning (missing documentation)
> --
>
> Key: SOLR-15502
> URL: https://issues.apache.org/jira/browse/SOLR-15502
> Project: Solr
>  Issue Type: Improvement
>  Security Level: Public(Default Security Level. Issues are Public) 
>  Components: documentation
>Affects Versions: 8.9
>Reporter: Bernd Wahlen
>Priority: Minor
>
> after upgrading from 8.8.2 to 8.9.0 i got the following warning:
> MetricsCollectorHandler
> Solr loaded a deprecated plugin/analysis class 
> [org.apache.solr.handler.admin.MetricsCollectorHandler]. Please consult 
> documentation how to replace it accordingly.
> i found the corresponding change:
> https://solr.apache.org/docs/8_9_0/changes/Changes.html#v8.9.0.other_changes
> SOLR-15416
> but not how to solve it (documenaton mentioned above in the warning is 
> missing).
> I also think link to the documentation in the release notes has changed/is 
> broken:
> https://github.com/apache/lucene-solr/blob/master/solr/solr-ref-guide/src/solr-upgrade-notes.adoc
> =>
> https://gitbox.apache.org/repos/asf?p=solr.git;a=blob;f=solr/solr-ref-guide/src/solr-upgrade-notes.adoc
> but cannot find how to solve that warning here also.
> I grep my configs but can't find anything related.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@solr.apache.org
For additional commands, e-mail: issues-h...@solr.apache.org



[jira] [Resolved] (SOLR-15502) MetricsCollectorHandler deprecated warning (missing documentation)

2021-07-05 Thread Andrzej Bialecki (Jira)


 [ 
https://issues.apache.org/jira/browse/SOLR-15502?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andrzej Bialecki resolved SOLR-15502.
-
  Assignee: Andrzej Bialecki
Resolution: Fixed

I removed this annotation in branch_8x. Thanks Bernd for reporting this!

> MetricsCollectorHandler deprecated warning (missing documentation)
> --
>
> Key: SOLR-15502
> URL: https://issues.apache.org/jira/browse/SOLR-15502
> Project: Solr
>  Issue Type: Improvement
>  Security Level: Public(Default Security Level. Issues are Public) 
>  Components: documentation
>Affects Versions: 8.9
>Reporter: Bernd Wahlen
>Assignee: Andrzej Bialecki
>Priority: Minor
> Fix For: 8.10
>
>
> after upgrading from 8.8.2 to 8.9.0 i got the following warning:
> MetricsCollectorHandler
> Solr loaded a deprecated plugin/analysis class 
> [org.apache.solr.handler.admin.MetricsCollectorHandler]. Please consult 
> documentation how to replace it accordingly.
> i found the corresponding change:
> https://solr.apache.org/docs/8_9_0/changes/Changes.html#v8.9.0.other_changes
> SOLR-15416
> but not how to solve it (documenaton mentioned above in the warning is 
> missing).
> I also think link to the documentation in the release notes has changed/is 
> broken:
> https://github.com/apache/lucene-solr/blob/master/solr/solr-ref-guide/src/solr-upgrade-notes.adoc
> =>
> https://gitbox.apache.org/repos/asf?p=solr.git;a=blob;f=solr/solr-ref-guide/src/solr-upgrade-notes.adoc
> but cannot find how to solve that warning here also.
> I grep my configs but can't find anything related.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@solr.apache.org
For additional commands, e-mail: issues-h...@solr.apache.org



[jira] [Updated] (SOLR-15502) MetricsCollectorHandler deprecated warning (missing documentation)

2021-07-05 Thread Andrzej Bialecki (Jira)


 [ 
https://issues.apache.org/jira/browse/SOLR-15502?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andrzej Bialecki updated SOLR-15502:

Fix Version/s: 8.10

> MetricsCollectorHandler deprecated warning (missing documentation)
> --
>
> Key: SOLR-15502
> URL: https://issues.apache.org/jira/browse/SOLR-15502
> Project: Solr
>  Issue Type: Improvement
>  Security Level: Public(Default Security Level. Issues are Public) 
>  Components: documentation
>Affects Versions: 8.9
>Reporter: Bernd Wahlen
>Priority: Minor
> Fix For: 8.10
>
>
> after upgrading from 8.8.2 to 8.9.0 i got the following warning:
> MetricsCollectorHandler
> Solr loaded a deprecated plugin/analysis class 
> [org.apache.solr.handler.admin.MetricsCollectorHandler]. Please consult 
> documentation how to replace it accordingly.
> i found the corresponding change:
> https://solr.apache.org/docs/8_9_0/changes/Changes.html#v8.9.0.other_changes
> SOLR-15416
> but not how to solve it (documenaton mentioned above in the warning is 
> missing).
> I also think link to the documentation in the release notes has changed/is 
> broken:
> https://github.com/apache/lucene-solr/blob/master/solr/solr-ref-guide/src/solr-upgrade-notes.adoc
> =>
> https://gitbox.apache.org/repos/asf?p=solr.git;a=blob;f=solr/solr-ref-guide/src/solr-upgrade-notes.adoc
> but cannot find how to solve that warning here also.
> I grep my configs but can't find anything related.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@solr.apache.org
For additional commands, e-mail: issues-h...@solr.apache.org



  1   2   >