steveloughran opened a new pull request, #49779:
URL: https://github.com/apache/spark/pull/49779

   
   
   ### What changes were proposed in this pull request?
   
   When enabled, cloud store client audit context is set to the
   same context string as the Hadoop IPC context.
   
   
   ### Why are the changes needed?
   
   CallerContext adds information about the spark task to hadoop IPC context 
and then to HDFS, YARN and HBase server logs.
   
   It is also possible to update the cloud storage "audit context".
   Storage clients can attach the audit information to requests to be stored in 
the service's own logs, where it can be retrieved, parsed and used for analysis.
   
   It is currently supported by the S3A connector, which adds the information 
to a synthetic referrer header, which is then stored in the S3 Server logs. 
(Not cloudtrail, sadly).
   
   See [S3A 
Auditing](https://hadoop.apache.org/docs/current/hadoop-aws/tools/hadoop-aws/auditing.html)
   
   ### Does this PR introduce _any_ user-facing change?
    
   If enabled, it adds extra entries in cloud storage server logs through cloud
   storage clients which support it.
   
   
   ### How was this patch tested?
   
   Expanded existing test `"Set Spark CallerContext"` to verify
   full setting of passed down parameters to caller and audit contexts.
   This required extracting the functional code of 
`CallerContext.setCurrentContext`
   into a `@VisibleForTesting private[util]` method `setCurrentContext(Boolean)`
   
   Without this, the test suite only ran if the process had been launched
   with the configuration option `"hadoop.caller.context.enabled` being set
   to true -this is not the default, so the existing test suite code
   was probably never executed.
   
   
   ### Was this patch authored or co-authored using generative AI tooling?
   
   No
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

Reply via email to