Re: Continuous errors with Azure ABFSS

2023-11-10 Thread Alexis Sarda-Espinosa
After enabling some more logging for the storage account, I figured out the errors correspond to 404 PathNotFound responses. My guess is the file system checks the status of a path to see if it exists or not before trying to write to it, in this case for _metadata files from each new checkpoint. Se

Re: Continuous errors with Azure ABFSS

2023-10-05 Thread Alexis Sarda-Espinosa
Yes, that also works correctly, at least based on the Kafka source we use (we'd get an alert if it suddenly started consuming from a very old offset). Regards, Alexis. On Thu, 5 Oct 2023, 19:36 ramkrishna vasudevan, wrote: > Sorry for the late reply. Just in case you restart the job , is it abl

Re: Continuous errors with Azure ABFSS

2023-10-05 Thread ramkrishna vasudevan
Sorry for the late reply. Just in case you restart the job , is it able to safely use the checkpoint and get back to the checkpointed state? Regards Ram, On Thu, Sep 28, 2023 at 4:46 PM Alexis Sarda-Espinosa < sarda.espin...@gmail.com> wrote: > Hi Surendra, > > there are no exceptions in the log

Re: Continuous errors with Azure ABFSS

2023-09-28 Thread Alexis Sarda-Espinosa
Hi Surendra, there are no exceptions in the logs, nor anything salient with INFO/WARN/ERROR levels. The checkpoints are definitely completing, we even set the config execution.checkpointing.tolerable-failed-checkpoints: 1 Regards, Alexis. Am Do., 28. Sept. 2023 um 09:32 Uhr schrieb Surendra Sin

Re: Continuous errors with Azure ABFSS

2023-09-28 Thread Surendra Singh Lilhore
Hi Alexis, Could you please check the TaskManager log for any exceptions? Thanks Surendra On Thu, Sep 28, 2023 at 7:06 AM Alexis Sarda-Espinosa < sarda.espin...@gmail.com> wrote: > Hello, > > We are using ABFSS for RocksDB's backend as well as the storage dir > required for Kubernetes HA. In t

Re: Continuous errors with Azure ABFSS

2023-09-28 Thread Alexis Sarda-Espinosa
Hi Ram, Thanks for that. We configure a path with ABFSS scheme in the following settings: - state.checkpoints.dir - state.savepoints.dir - high-availability.storageDir We use RocksDB with incremental checkpointing every minute. I found the metrics from Azure in the storage account under Monitor

Re: Continuous errors with Azure ABFSS

2023-09-27 Thread ramkrishna vasudevan
Can you help with more info here? The RocksDB backend itself is in ABFS instead of local? Or you mean the checkpoint is in ABFS but local dir for RocksDB is in local storage? GetPathSTatus is done by your monitoring pages? We run Flink on ABFS so we would like to see if we can help you out. Regar