Re: Troubleshooting checkpoint expiration

2024-08-31 Thread Alexis Sarda-Espinosa
Well, for future reference, this helped in the case of ABFS: logger.abfs.name = org.apache.hadoop.fs.azurebfs.services.AbfsClient logger.abfs.level = DEBUG logger.abfs.filter.failures.type = RegexFilter logger.abfs.filter.failures.regex = ^.*([Ff]ail|[Rr]etry|: [45][0-9]{2},).*$ logger.abfs.filte

Re: Troubleshooting checkpoint expiration

2024-08-07 Thread Alexis Sarda-Espinosa
I must ask again if anyone at least knows if Flink's file system can expose more detailed exceptions when things go wrong, Azure support is asking for specific exception messages to decide how to troubleshoot. Regards, Alexis. Am Di., 23. Juli 2024 um 13:39 Uhr schrieb Alexis Sarda-Espinosa < sar

Re: Troubleshooting checkpoint expiration

2024-07-23 Thread Alexis Sarda-Espinosa
Hi again, I found a Hadoop class that can log latency information [1], but since I don't see any exceptions in the logs when a checkpoint expires due to timeout, I'm still wondering if I can change other log levels to get more insights, maybe somewhere in Flink's file system abstractions? [1] htt

Troubleshooting checkpoint expiration

2024-07-19 Thread Alexis Sarda-Espinosa
Hello, We have a Flink job that uses ABFSS for checkpoints and related state. Lately we see a lot of exceptions due to expiration of checkpoints, and I'm guessing that's an issue in the infrastructure or on Azure's side, but I was wondering if there are Flink/Hadoop Java packages that log potentia