sarutak commented on PR #54575: URL: https://github.com/apache/spark/pull/54575#issuecomment-3998215797
@dongjoon-hyun Thank you for your feedback. > Do you want to have per-directory configurations in the future? I considered it might be helpful to have per-directory configurations (e.g. `spark.history.fs.cleaner.*`) but this such configurations are not supported at least in this PR, and I'd like to start with simple global settings and improve based on user feedback. > For now, spark.history.fs.update.interval is supposed to be applied for one scan for all directories? Yes. > spark.history.fs.cleaner.interval is also supposed to be applied for one scan for all directories? Yes. > When spark.history.fs.cleaner.maxNum is applied, > This PR will consider the total number of files for all directories, right? > Which directory will be selected as a victim for the tie? Yes, the property is applied to the total number of log entries across all directories. As the updated document says, when the limit is exceeded, the oldest completed attempts are deleted first regardless of which directory they belong to. > Since this introduces lots of ambiguity a little, could you revise the PR title and provide a corresponding documentation update, docs, together in this PR? Updated (You said `revise the PR title` but I thought it's type for PR description so I've updated only the description). -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
