It's a good topic to bring up.
Have you tried the JFR support for method timing and tracing events
that JEP 520 introduced in JDK 25? I'm wondering if
-XX:StartFlightRecording:jdk.MethodTrace#filter=java.net.InetAddress::getByName
records events that could help here.
If new events are introduced then I could image them having
"NameService" rather than "Dns" in the name as JDK doesn't use DNS
directly (except the JNDI-DNS provider), it is whatever is configured on
the system.
-Alan
On 13/11/2025 11:07, [email protected] wrote:
Hello,
I would like to start a discussion on introducing new JFR events for
DNS lookups. While many lookups are DNS in cloud-native environments,
the JDK uses the configured name service, so the event naming and
semantics should not imply DNS-only behavior. I’m seeking feedback on
scope, naming, and payload fields.
Motivation
* High-frequency, latency-sensitive lookups are critical for service
discovery.
* Current gaps:
o Cannot distinguish cache hits vs. network lookups
o Hard to trace lookup latency and diagnose timeouts/failures
o Concurrent libraries may cause redundant lookups
* Value:
o End-to-end observability: lookup → socket connect → data transfer
o Troubleshooting: identify timeouts, resolution failures
o Performance: evaluate cache policies, detect hotspot names
o Security: audit external domains accessed
*Proposed event (initial draft)*
*Event name:* jdk.DnsLookup
*When:* Emitted around DNS hostname resolution call boundaries, including:
* Actual network DNS queries (when cache is disabled or cache miss
occurs)
* Cache hits (when result is retrieved from DNS cache)
* Stale data usage (when expired but still valid cached data is used)
* Background DNS cache refresh operations
*Key fields (feedback welcome):*
* host (String): The hostname being resolved
* result (String): Comma-separated list of resolved IP addresses, or
error message if lookup failed
* success (boolean): Whether the DNS lookup was successful
* cached (boolean): Whether the result was retrieved from cache
(true) or from actual DNS network query (false). This helps
distinguish between three use cases:
o Actual network queries (cached=false) - represents real DNS
network traffic
o Cache hits (cached=true, stale=false) - repeated lookups using
fresh cached data
o Stale data usage (cached=true, stale=true) - application
continues with expired but still valid cached data when DNS
refresh fails
* ttl (long, seconds): Time to live in seconds. Values:
o 0 or -1: Not cached or forever cached
o > 0: Actual remaining TTL if cached
* stale (boolean): Whether stale cached data was used (only valid
when cached=true). Helps identify semi-error scenarios where DNS
errors occur but application continues using stale cached records
*Event name:* jdk.DnsCacheStatistics
*When:* Periodic event emitted at configurable intervals (default: 5
seconds in default.jfc, 1 second in profile.jfc). This is a statistics
event similar to jdk.ExceptionStatistics, providing aggregate metrics
about the DNS cache state.
*Key fields (feedback welcome):*
* cacheSize (long): Current number of entries in the DNS cache.
Useful for monitoring cache growth and understanding cache
utilization patterns.
* staleEntries (long): Number of stale entries currently in the
cache (entries that have expired but are still within the stale
period). Helps identify how many entries are using stale data,
which is important for understanding cache behavior in scenarios
where DNS refresh fails.
* entriesRemoved (long): Number of entries that have been removed
during cache cleanup operations. This metric tracks cache eviction
and helps understand cache churn patterns, which is particularly
useful in Kubernetes and cloud-native environments where DNS
entries may change frequently.
*Use cases:*
* Monitoring DNS cache size growth over time
* Identifying cache cleanup frequency and patterns
* Understanding stale data usage in production environments
* Troubleshooting DNS-related performance issues in microservices
architectures
* Observing cache behavior during DNS server failures or network
partitions
Prototype/PR
* A preliminary PR is available for context and discussion:
o https://git.openjdk.org/jdk/pull/28110
<https://git.openjdk.org/jdk/pull/28110>
* I will update the design/implementation per feedback from this thread.
Thanks in advance for your feedback!
Best regards,
NeayGuyCoding