[prometheus-users] Re: remove a label filter for all PromQL queries

Johny Wed, 05 Apr 2023 13:51:21 -0700

Also, all the problems are in the DBs (backend prometheus) not front end.

On Wednesday, April 5, 2023 at 4:50:39 PM UTC-4 Johny wrote:


> Prometheus version is 2.39.1
>
> There are many users and some legacy clients that add friction to changing 
> queries across the board. 
> During ingestion, we can make use of relabeling to drop labels 
> automatically.
>
> I am fairly certain this is the root cause for performance degradation in 
> the system, as we're able to reproduce the problem in a load test --- 
> simulating queries with/without the concerning label filter, the latter 
> performing much better with no memory problems.
>
>
>
> On Wednesday, April 5, 2023 at 3:50:08 PM UTC-4 Brian Candler wrote:
>
>> I wonder if the filtering algorithm is really as simplistic as the 
>> Timescale blog implies ("for every label/value pair, first find *every* 
>> possible series which matches; then take the intersection of the 
>> results")?  I don't know, I'll leave others to answer that.  If it had some 
>> internal stats so that it could start with the labels which match the 
>> fewest number of series, I'd expect it to do that; and the TSDB stats in 
>> the web interface suggests that it does.
>>
>> I ask again: what version(s) of Prometheus are you running?
>>
>> Are you experiencing this with all prometheus components, i.e. a 
>> prometheus front-end talking to prometheus back-ends with remote_read?
>>
>> I think the ideal thing would be to narrow this down to a reproducible 
>> test case: either a particular pattern of remote_read queries which is 
>> performing badly at the backend, or a particular query sent to the 
>> front-end which is being sent to the backend in a suboptimal way (e.g. not 
>> including all possible label filters at once).
>>
>> You said "for now we need a workaround".  Is it not sufficient simply to 
>> remove {*global_label="constant-value"*} from your queries? After all, 
>> you're already thinking about removing this label at ingestion time, and if 
>> you do that, you won't be able to filter on it anyway.
>>
>> On Wednesday, 5 April 2023 at 18:50:02 UTC+1 Johny wrote:
>>
>>> The count of time series/metric for a few selected metrics is close to 2 
>>> million today. For scalability, we shard the data onto a few Prometheus 
>>> instances and use remote read from a front end Prometheus to fetch data 
>>> from the storage units.
>>>
>>> The series' are fetched from time series blocks by taking an 
>>> intersection of series (or postings) across all label filters in query. 
>>> First, the index postings are scanned for each label filter; second step 
>>> finds matching series with an implicit AND operator. From my understanding, 
>>> the low cardinality label present in all series will cause a large portion 
>>> of index to load in memory (during the first step). We've also observed 
>>> memory spikes during query processing when the system gets a steady dose of 
>>> queries. Without including this filter, the memory usage is lower and query 
>>> returns much faster.
>>>
>>>
>>> https://www.timescale.com/blog/how-prometheus-querying-works-and-why-you-should-care/#:~:text=Prometheus%20Storage%3A%20Indexing%20Strategies,-Let's%20now%20look&text=The%20postings%20index%20represents%20the,%3D%E2%80%9D%3A9090%E2%80%9D%7D%20
>>> .
>>>   
>>> So, I believe if we exclude the const label in ingestion, we won't have 
>>> this problem in the long term. Excluding this filter somewhere in the front 
>>> end will help mitigate this problem.
>>>
>>>
>>>
>>> On Wednesday, April 5, 2023 at 1:13:42 PM UTC-4 Brian Candler wrote:
>>>
>>>> Also: how many timeseries are you working with, in terms of the 
>>>> "my_series" that you are querying, and globally on the whole system?
>>>>
>>>> On Wednesday, 5 April 2023 at 18:12:11 UTC+1 Brian Candler wrote:
>>>>
>>>>> Adding a constant label to every timeseries should have almost zero 
>>>>> impact on memory usage.
>>>>>
>>>>> Can you clarify what you're saying, and how you've come to your 
>>>>> diagnosis? What version of prometheus are you running? When you say 
>>>>> "backends" in the plural, how have you set this up?
>>>>>
>>>>> At one point you seem to be saying it's something to do with 
>>>>> ingestion, but then you seem to be saying it's something to do with 
>>>>> queries 
>>>>> (*"Without this filter, the queries run reasonably well"*). Can you 
>>>>> give specific examples of filters which show the difference in behaviour?
>>>>>
>>>>> Again: the queries
>>>>>   my_series{global_label="constant-value",  l1="..", l2=".."}
>>>>>   my_series{l1="..", l2=".."}
>>>>> should perform almost identically, as they will select the same subset 
>>>>> of timeseries.
>>>>>
>>>>> On Wednesday, 5 April 2023 at 17:42:33 UTC+1 Johny wrote:
>>>>>
>>>>>> There is a performance related issue we're facing in Prometheus 
>>>>>> coming from a label with a constant value across all (thousands of) time 
>>>>>> series. The label filter in query causes a large quantity of metadata to 
>>>>>> load in memory overwhelming Prometheus backends. Without this filter, 
>>>>>> the 
>>>>>> queries run reasonably well. We are planning to exclude this label in 
>>>>>> ingestion in future, but for now we need a workaround.
>>>>>>
>>>>>> my_series{*global_label="constant-value"*,  l1="..", l2=".."}
>>>>>>
>>>>>> Is there a mechanism to automatically exclude global_label in query 
>>>>>> configuration: remote_read subsection, or elsewhere?
>>>>>>
>>>>>> thanks,
>>>>>> Johny
>>>>>>
>>>>>>
>>>>>>
>>>>>>

-- 
You received this message because you are subscribed to the Google Groups 
"Prometheus Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To view this discussion on the web visit 
https://groups.google.com/d/msgid/prometheus-users/b905bc69-f286-4a6a-b68c-60363e796c8dn%40googlegroups.com.

[prometheus-users] Re: remove a label filter for all PromQL queries

Reply via email to