cnauroth opened a new pull request, #49858:
URL: https://github.com/apache/spark/pull/49858

   ### What changes were proposed in this pull request?
   
   Initialize the Hadoop RPC `CallerContext` during History Server startup, 
before `FileSystem` access. Calls to HDFS will get tagged in the audit log as 
originating from the History Server.
   
   ### Why are the changes needed?
   
   Other Spark processes set the `CallerContext`, so that additional auditing 
context propagates in Hadoop RPC calls. This PR provides auditing context for 
calls from the History Server. Other callers provide additional information 
like app ID, attempt ID, etc. We don't provide that here through History 
Server, which serves multiple apps/attempts.
   
   ### Does this PR introduce _any_ user-facing change?
   
   Yes. In environments that configure `hadoop.caller.context.enabled=true`, 
users will now see additional information in the HDFS audit logs explicitly 
stating that calls originated from the History Server.
   
   ### How was this patch tested?
   
   A new unit test has been added. All tests pass in the history package.
   
   ```
   build/mvn -pl core test -Dtest=none 
-DmembersOnlySuites=org.apache.spark.deploy.history
   ```
   
   When the changes are deployed to a running cluster, the new caller context 
is visible in the HDFS audit logs.
   
   ```
   2025-02-07 23:00:54,657 INFO 
org.apache.hadoop.hdfs.server.namenode.FSNamesystem.audit: allowed=true 
ugi=spark (auth:SIMPLE) ip=/10.240.5.205        cmd=open        
src=/133bcb94-52b8-4356-ad9b-7358c78ce7fd/spark-job-history/application_1738779819434_0012
      dst=null        perm=null       proto=rpc       
callerContext=SPARK_HISTORY
   2025-02-07 23:00:54,683 INFO 
org.apache.hadoop.hdfs.server.namenode.FSNamesystem.audit: allowed=true 
ugi=spark (auth:SIMPLE) ip=/10.240.5.205        cmd=open        
src=/133bcb94-52b8-4356-ad9b-7358c78ce7fd/spark-job-history/application_1738779819434_0011
      dst=null        perm=null       proto=rpc       
callerContext=SPARK_HISTORY
   2025-02-07 23:00:54,699 INFO 
org.apache.hadoop.hdfs.server.namenode.FSNamesystem.audit: allowed=true 
ugi=spark (auth:SIMPLE) ip=/10.240.5.205        cmd=open        
src=/133bcb94-52b8-4356-ad9b-7358c78ce7fd/spark-job-history/application_1738779819434_0011
      dst=null        perm=null       proto=rpc       
callerContext=SPARK_HISTORY
   2025-02-07 23:00:54,715 INFO 
org.apache.hadoop.hdfs.server.namenode.FSNamesystem.audit: allowed=true 
ugi=spark (auth:SIMPLE) ip=/10.240.5.205        cmd=open        
src=/133bcb94-52b8-4356-ad9b-7358c78ce7fd/spark-job-history/application_1738779819434_0010
      dst=null        perm=null       proto=rpc       
callerContext=SPARK_HISTORY
   2025-02-07 23:00:54,729 INFO 
org.apache.hadoop.hdfs.server.namenode.FSNamesystem.audit: allowed=true 
ugi=spark (auth:SIMPLE) ip=/10.240.5.205        cmd=open        
src=/133bcb94-52b8-4356-ad9b-7358c78ce7fd/spark-job-history/application_1738779819434_0010
      dst=null        perm=null       proto=rpc       
callerContext=SPARK_HISTORY
   2025-02-07 23:00:54,743 INFO 
org.apache.hadoop.hdfs.server.namenode.FSNamesystem.audit: allowed=true 
ugi=spark (auth:SIMPLE) ip=/10.240.5.205        cmd=open        
src=/133bcb94-52b8-4356-ad9b-7358c78ce7fd/spark-job-history/application_1738779819434_0009
      dst=null        perm=null       proto=rpc       
callerContext=SPARK_HISTORY
   2025-02-07 23:00:54,755 INFO 
org.apache.hadoop.hdfs.server.namenode.FSNamesystem.audit: allowed=true 
ugi=spark (auth:SIMPLE) ip=/10.240.5.205        cmd=open        
src=/133bcb94-52b8-4356-ad9b-7358c78ce7fd/spark-job-history/application_1738779819434_0009
      dst=null        perm=null       proto=rpc       
callerContext=SPARK_HISTORY
   2025-02-07 23:00:54,767 INFO 
org.apache.hadoop.hdfs.server.namenode.FSNamesystem.audit: allowed=true 
ugi=spark (auth:SIMPLE) ip=/10.240.5.205        cmd=open        
src=/133bcb94-52b8-4356-ad9b-7358c78ce7fd/spark-job-history/application_1738779819434_0008
      dst=null        perm=null       proto=rpc       
callerContext=SPARK_HISTORY
   2025-02-07 23:00:54,779 INFO 
org.apache.hadoop.hdfs.server.namenode.FSNamesystem.audit: allowed=true 
ugi=spark (auth:SIMPLE) ip=/10.240.5.205        cmd=open        
src=/133bcb94-52b8-4356-ad9b-7358c78ce7fd/spark-job-history/application_1738779819434_0008
      dst=null        perm=null       proto=rpc       
callerContext=SPARK_HISTORY
   2025-02-07 23:01:04,160 INFO 
org.apache.hadoop.hdfs.server.namenode.FSNamesystem.audit: allowed=true 
ugi=spark (auth:SIMPLE) ip=/10.240.5.205        cmd=listStatus  
src=/133bcb94-52b8-4356-ad9b-7358c78ce7fd/spark-job-history     dst=null        
perm=null       proto=rpc       callerContext=SPARK_HISTORY
   ```
   
   ### Was this patch authored or co-authored using generative AI tooling?
   
   No.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

Reply via email to