Re: [DISCUSS] Remove realm_id metric tag

2025-05-26 Thread Jean-Baptiste Onofré
Hi Alex Thanks ! (I will take a look at the PR). Then it looks good to me. Regards JB On Mon, May 26, 2025 at 5:33 PM Alex Dutra wrote: > > Hi JB, hi all, > > My PR already has a safety net similar to what you propose: if the > cardinality of realm IDs in HTTP metrics goes above a configurable

Re: [DISCUSS] Remove realm_id metric tag

2025-05-26 Thread Alex Dutra
Hi JB, hi all, My PR already has a safety net similar to what you propose: if the cardinality of realm IDs in HTTP metrics goes above a configurable threshold (100 by default), a warning is printed and no more HTTP metrics will be recorded. Thanks, Alex On Mon, May 26, 2025 at 5:22 PM Dmitri Bou

Re: [DISCUSS] Remove realm_id metric tag

2025-05-26 Thread Dmitri Bourlatchkov
If we really need a safety valve, add a boolean flag like "metrics.realm-id.enabled", default true. Operators who don’t want the dimension can flip it off; everyone else keeps the out-of-the-box visibility that keeps multi-tenant ops sane. This is exactly what PR [1662] proposes, right? The "bre

Re: [DISCUSS] Remove realm_id metric tag

2025-05-26 Thread Jean-Baptiste Onofré
Hi, What about a limit on the realm_id cardinality metric as a compromise ? Like we keep the realm_id metric, but by configuration, the user defines a limit (to avoid flooding/DoS). This is just a "temp" workaround: we should request the metric only if the realm is valid, and it should be coupled

Re: [DISCUSS] Remove realm_id metric tag

2025-05-24 Thread Yufei Gu
Hi Alex, thanks for being flexible. Given how valuable the tag is for day-to-day triage, I’d prefer an opt-out model: 1. Default is enabled, so that operators get realm-level visibility “for free.” 2. Anyone worried about cardinality can turn it off with a single flag (metrics.realm-id.enabled=fal

Re: [DISCUSS] Remove realm_id metric tag

2025-05-24 Thread Alex Dutra
Hi Yufei, Thanks for sharing your concerns. In fact we were already leaning towards an enablement flag, however from Mike's message I inferred that the metric tag would be an opt-in feature, but you seem to suggest an opt-out feature instead. I'm fine either way, the PR I opened uses opt-in, but

Re: [DISCUSS] Remove realm_id metric tag

2025-05-23 Thread Yufei Gu
Hey folks, I’m not sold on dropping realm_id. Why it matters 1. It's pretty useful for first-line triage in multi-tenant clusters. When a latency spike or error burst hits the dashboards, the first SRE question is “Which realm(s)?” Having the tag lets us slice the time-series instant

Re: [DISCUSS] Remove realm_id metric tag

2025-05-23 Thread Alex Dutra
Hi, For reference the PR is here: https://github.com/apache/polaris/pull/1662 Alex On Fri, May 23, 2025 at 2:42 PM Robert Stupp wrote: > > +1 on adding a config and default to false - and later entirely remove > the "issue". > > Sounds like good plan to me! > > On 23.05.25 13:52, Alex Dutra wro

Re: [DISCUSS] Remove realm_id metric tag

2025-05-23 Thread Robert Stupp
+1 on adding a config and default to false - and later entirely remove the "issue". Sounds like good plan to me! On 23.05.25 13:52, Alex Dutra wrote: Hi Mike, That sounds like a plan! And yes, I agree with the slow approach. I will open a PR shortly with the new enablement flag. Alex On Fri

Re: [DISCUSS] Remove realm_id metric tag

2025-05-23 Thread Alex Dutra
Hi Mike, That sounds like a plan! And yes, I agree with the slow approach. I will open a PR shortly with the new enablement flag. Alex On Fri, May 23, 2025 at 2:42 AM Michael Collado wrote: > > My immediate reaction is that a configuration flag would be good enough for > us. Our setup prevents

Re: [DISCUSS] Remove realm_id metric tag

2025-05-22 Thread Michael Collado
My immediate reaction is that a configuration flag would be good enough for us. Our setup prevents requests without a valid realm, so we're not subject to the DDOS attack you mentioned, but we should address that in the default realm context resolver as well. I can do some further investigation to

Re: [DISCUSS] Remove realm_id metric tag

2025-05-22 Thread Alex Dutra
Hi again, Let's go back to the main concern: excessive memory required for metrics. To move this discussion forward, would you all agree to introduce a flag that activates the inclusion of realm IDs in the metrics? It seems like this could be a good middle ground that addresses everyone's needs.

Re: [DISCUSS] Remove realm_id metric tag

2025-05-21 Thread Robert Stupp
Ouch! That sounds very simple to exploit and quite serious. On 21.05.25 15:38, Alex Dutra wrote: Also worth noting: http_server_requests_seconds_* metrics are generated even for failed requests. So it would be very easy for an attacker to forge HTTP requests containing invalid realms and take th

Re: [DISCUSS] Remove realm_id metric tag

2025-05-21 Thread Alex Dutra
Also worth noting: http_server_requests_seconds_* metrics are generated even for failed requests. So it would be very easy for an attacker to forge HTTP requests containing invalid realms and take the server down. Alex On Wed, May 21, 2025 at 3:04 PM Robert Stupp wrote: > OOMs are bad - it's ex

Re: [DISCUSS] Remove realm_id metric tag

2025-05-21 Thread Robert Stupp
OOMs are bad - it's extremely hard (if not impossible) for any user to figure out why that happens. Bug reports would just read like "Polaris runs into an OOM - no further information available". IMHO the system should be stable by default, not the other way around. Even if there is a way to e

Re: [DISCUSS] Remove realm_id metric tag

2025-05-21 Thread Alex Dutra
Hi Mike, If you have around a hundred realms or more, imho you are already in trouble. The most problematic metric is the http_server_requests_seconds_* metrics family, e.g.: http_server_requests_seconds_count{application="Polaris",method="GET",outcome="CLIENT_ERROR",realm_id="POLARIS",status="4

Re: [DISCUSS] Remove realm_id metric tag

2025-05-20 Thread Michael Collado
Hmm, we do use the realm tag in our metric publishing. I understand the concern re: cardinality. Maybe we can support filtering metrics that have realm and support another metric without realm? On Mon, May 19, 2025 at 12:24 PM Dmitri Bourlatchkov wrote: > Removing realm_id from metrics tags make

Re: [DISCUSS] Remove realm_id metric tag

2025-05-19 Thread Dmitri Bourlatchkov
Removing realm_id from metrics tags makes sense to me (to avoid high cardinality). If we need to have insight into load differences from realm to realm, it might be preferable to introduce metrics dedicated to that rather than increasing the cardinality of every endpoint metric. Cheers, Dmitri.

Re: [DISCUSS] Remove realm_id metric tag

2025-05-19 Thread Robert Stupp
Hi, I think it's okay to remove the realm-ID from the metric tags and leave it in traces. So +1 from me on doing this. High cardinality values are not good for metrics (or metrics systems) and can easily cause a lot of "interesting situations" in production systems - things that are hard to

[DISCUSS] Remove realm_id metric tag

2025-05-15 Thread Alex Dutra
Hi all, I would like to suggest removing the "realm_id" metric tag entirely. My concern is that this tag has the potential for high cardinality, which is generally considered a bad practice when dealing with metrics. High cardinality can lead to performance issues and increased memory usage. Gra