Also, all the problems are in the DBs (backend prometheus) not front end.
On Wednesday, April 5, 2023 at 4:50:39 PM UTC-4 Johny wrote:
> Prometheus version is 2.39.1
>
> There are many users and some legacy clients that add friction to changing
> queries across the board.
> During ingestion, we can make use of relabeling to drop labels
> automatically.
>
> I am fairly certain this is the root cause for performance degradation in
> the system, as we're able to reproduce the problem in a load test ---
> simulating queries with/without the concerning label filter, the latter
> performing much better with no memory problems.
>
>
>
> On Wednesday, April 5, 2023 at 3:50:08 PM UTC-4 Brian Candler wrote:
>
>> I wonder if the filtering algorithm is really as simplistic as the
>> Timescale blog implies ("for every label/value pair, first find *every*
>> possible series which matches; then take the intersection of the
>> results")? I don't know, I'll leave others to answer that. If it had some
>> internal stats so that it could start with the labels which match the
>> fewest number of series, I'd expect it to do that; and the TSDB stats in
>> the web interface suggests that it does.
>>
>> I ask again: what version(s) of Prometheus are you running?
>>
>> Are you experiencing this with all prometheus components, i.e. a
>> prometheus front-end talking to prometheus back-ends with remote_read?
>>
>> I think the ideal thing would be to narrow this down to a reproducible
>> test case: either a particular pattern of remote_read queries which is
>> performing badly at the backend, or a particular query sent to the
>> front-end which is being sent to the backend in a suboptimal way (e.g. not
>> including all possible label filters at once).
>>
>> You said "for now we need a workaround". Is it not sufficient simply to
>> remove {*global_label="constant-value"*} from your queries? After all,
>> you're already thinking about removing this label at ingestion time, and if
>> you do that, you won't be able to filter on it anyway.
>>
>> On Wednesday, 5 April 2023 at 18:50:02 UTC+1 Johny wrote:
>>
>>> The count of time series/metric for a few selected metrics is close to 2
>>> million today. For scalability, we shard the data onto a few Prometheus
>>> instances and use remote read from a front end Prometheus to fetch data
>>> from the storage units.
>>>
>>> The series' are fetched from time series blocks by taking an
>>> intersection of series (or postings) across all label filters in query.
>>> First, the index postings are scanned for each label filter; second step
>>> finds matching series with an implicit AND operator. From my understanding,
>>> the low cardinality label present in all series will cause a large portion
>>> of index to load in memory (during the first step). We've also observed
>>> memory spikes during query processing when the system gets a steady dose of
>>> queries. Without including this filter, the memory usage is lower and query
>>> returns much faster.
>>>
>>>
>>> https://www.timescale.com/blog/how-prometheus-querying-works-and-why-you-should-care/#:~:text=Prometheus%20Storage%3A%20Indexing%20Strategies,-Let's%20now%20look&text=The%20postings%20index%20represents%20the,%3D%E2%80%9D%3A9090%E2%80%9D%7D%20
>>> .
>>>
>>> So, I believe if we exclude the const label in ingestion, we won't have
>>> this problem in the long term. Excluding this filter somewhere in the front
>>> end will help mitigate this problem.
>>>
>>>
>>>
>>> On Wednesday, April 5, 2023 at 1:13:42 PM UTC-4 Brian Candler wrote:
>>>
>>>> Also: how many timeseries are you working with, in terms of the
>>>> "my_series" that you are querying, and globally on the whole system?
>>>>
>>>> On Wednesday, 5 April 2023 at 18:12:11 UTC+1 Brian Candler wrote:
>>>>
>>>>> Adding a constant label to every timeseries should have almost zero
>>>>> impact on memory usage.
>>>>>
>>>>> Can you clarify what you're saying, and how you've come to your
>>>>> diagnosis? What version of prometheus are you running? When you say
>>>>> "backends" in the plural, how have you set this up?
>>>>>
>>>>> At one point you seem to be saying it's something to do with
>>>>> ingestion, but then you seem to be saying it's something to do with
>>>>> queries
>>>>> (*"Without this filter, the queries run reasonably well"*). Can you
>>>>> give specific examples of filters which show the difference in behaviour?
>>>>>
>>>>> Again: the queries
>>>>> my_series{global_label="constant-value", l1="..", l2=".."}
>>>>> my_series{l1="..", l2=".."}
>>>>> should perform almost identically, as they will select the same subset
>>>>> of timeseries.
>>>>>
>>>>> On Wednesday, 5 April 2023 at 17:42:33 UTC+1 Johny wrote:
>>>>>
>>>>>> There is a performance related issue we're facing in Prometheus
>>>>>> coming from a label with a constant value across all (thousands of) time
>>>>>> series. The label filter in query causes a large quantity of metadata to
>>>>>> load in memory overwhelming Prometheus backends. Without this filter,
>>>>>> the
>>>>>> queries run reasonably well. We are planning to exclude this label in
>>>>>> ingestion in future, but for now we need a workaround.
>>>>>>
>>>>>> my_series{*global_label="constant-value"*, l1="..", l2=".."}
>>>>>>
>>>>>> Is there a mechanism to automatically exclude global_label in query
>>>>>> configuration: remote_read subsection, or elsewhere?
>>>>>>
>>>>>> thanks,
>>>>>> Johny
>>>>>>
>>>>>>
>>>>>>
>>>>>>
--
You received this message because you are subscribed to the Google Groups
"Prometheus Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email
to [email protected].
To view this discussion on the web visit
https://groups.google.com/d/msgid/prometheus-users/b905bc69-f286-4a6a-b68c-60363e796c8dn%40googlegroups.com.