cnauroth opened a new pull request, #49858: URL: https://github.com/apache/spark/pull/49858
### What changes were proposed in this pull request? Initialize the Hadoop RPC `CallerContext` during History Server startup, before `FileSystem` access. Calls to HDFS will get tagged in the audit log as originating from the History Server. ### Why are the changes needed? Other Spark processes set the `CallerContext`, so that additional auditing context propagates in Hadoop RPC calls. This PR provides auditing context for calls from the History Server. Other callers provide additional information like app ID, attempt ID, etc. We don't provide that here through History Server, which serves multiple apps/attempts. ### Does this PR introduce _any_ user-facing change? Yes. In environments that configure `hadoop.caller.context.enabled=true`, users will now see additional information in the HDFS audit logs explicitly stating that calls originated from the History Server. ### How was this patch tested? A new unit test has been added. All tests pass in the history package. ``` build/mvn -pl core test -Dtest=none -DmembersOnlySuites=org.apache.spark.deploy.history ``` When the changes are deployed to a running cluster, the new caller context is visible in the HDFS audit logs. ``` 2025-02-07 23:00:54,657 INFO org.apache.hadoop.hdfs.server.namenode.FSNamesystem.audit: allowed=true ugi=spark (auth:SIMPLE) ip=/10.240.5.205 cmd=open src=/133bcb94-52b8-4356-ad9b-7358c78ce7fd/spark-job-history/application_1738779819434_0012 dst=null perm=null proto=rpc callerContext=SPARK_HISTORY 2025-02-07 23:00:54,683 INFO org.apache.hadoop.hdfs.server.namenode.FSNamesystem.audit: allowed=true ugi=spark (auth:SIMPLE) ip=/10.240.5.205 cmd=open src=/133bcb94-52b8-4356-ad9b-7358c78ce7fd/spark-job-history/application_1738779819434_0011 dst=null perm=null proto=rpc callerContext=SPARK_HISTORY 2025-02-07 23:00:54,699 INFO org.apache.hadoop.hdfs.server.namenode.FSNamesystem.audit: allowed=true ugi=spark (auth:SIMPLE) ip=/10.240.5.205 cmd=open src=/133bcb94-52b8-4356-ad9b-7358c78ce7fd/spark-job-history/application_1738779819434_0011 dst=null perm=null proto=rpc callerContext=SPARK_HISTORY 2025-02-07 23:00:54,715 INFO org.apache.hadoop.hdfs.server.namenode.FSNamesystem.audit: allowed=true ugi=spark (auth:SIMPLE) ip=/10.240.5.205 cmd=open src=/133bcb94-52b8-4356-ad9b-7358c78ce7fd/spark-job-history/application_1738779819434_0010 dst=null perm=null proto=rpc callerContext=SPARK_HISTORY 2025-02-07 23:00:54,729 INFO org.apache.hadoop.hdfs.server.namenode.FSNamesystem.audit: allowed=true ugi=spark (auth:SIMPLE) ip=/10.240.5.205 cmd=open src=/133bcb94-52b8-4356-ad9b-7358c78ce7fd/spark-job-history/application_1738779819434_0010 dst=null perm=null proto=rpc callerContext=SPARK_HISTORY 2025-02-07 23:00:54,743 INFO org.apache.hadoop.hdfs.server.namenode.FSNamesystem.audit: allowed=true ugi=spark (auth:SIMPLE) ip=/10.240.5.205 cmd=open src=/133bcb94-52b8-4356-ad9b-7358c78ce7fd/spark-job-history/application_1738779819434_0009 dst=null perm=null proto=rpc callerContext=SPARK_HISTORY 2025-02-07 23:00:54,755 INFO org.apache.hadoop.hdfs.server.namenode.FSNamesystem.audit: allowed=true ugi=spark (auth:SIMPLE) ip=/10.240.5.205 cmd=open src=/133bcb94-52b8-4356-ad9b-7358c78ce7fd/spark-job-history/application_1738779819434_0009 dst=null perm=null proto=rpc callerContext=SPARK_HISTORY 2025-02-07 23:00:54,767 INFO org.apache.hadoop.hdfs.server.namenode.FSNamesystem.audit: allowed=true ugi=spark (auth:SIMPLE) ip=/10.240.5.205 cmd=open src=/133bcb94-52b8-4356-ad9b-7358c78ce7fd/spark-job-history/application_1738779819434_0008 dst=null perm=null proto=rpc callerContext=SPARK_HISTORY 2025-02-07 23:00:54,779 INFO org.apache.hadoop.hdfs.server.namenode.FSNamesystem.audit: allowed=true ugi=spark (auth:SIMPLE) ip=/10.240.5.205 cmd=open src=/133bcb94-52b8-4356-ad9b-7358c78ce7fd/spark-job-history/application_1738779819434_0008 dst=null perm=null proto=rpc callerContext=SPARK_HISTORY 2025-02-07 23:01:04,160 INFO org.apache.hadoop.hdfs.server.namenode.FSNamesystem.audit: allowed=true ugi=spark (auth:SIMPLE) ip=/10.240.5.205 cmd=listStatus src=/133bcb94-52b8-4356-ad9b-7358c78ce7fd/spark-job-history dst=null perm=null proto=rpc callerContext=SPARK_HISTORY ``` ### Was this patch authored or co-authored using generative AI tooling? No. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org