sarutak commented on PR #54575:
URL: https://github.com/apache/spark/pull/54575#issuecomment-4006323577
@LuciferYang Thank you for pointing it out. While event log filenames are
typically unique in normal operation, duplicates can occur during migration or
multi-cluster log aggregation.
I've fixed `resolveLogPath` to use the source directory hint
(`logSourceFullPath`) that is already recorded during indexing:
1. When `logSourceFullPath` is available, the method first checks that
specific directory
2. If the file is not found there (e.g., file was moved), it falls back to
scanning all directories
I also added a test based on what you show.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]