[ 
https://issues.apache.org/jira/browse/FLINK-38035?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Robert Metzger reassigned FLINK-38035:
--------------------------------------

    Assignee: Niha

> Security Vulnerability in PyFlink Logging Mechanism (PythonEnvUtils.java)
> -------------------------------------------------------------------------
>
>                 Key: FLINK-38035
>                 URL: https://issues.apache.org/jira/browse/FLINK-38035
>             Project: Flink
>          Issue Type: Bug
>          Components: API / Python
>    Affects Versions: 1.19.1, 1.20.1
>            Reporter: Niha
>            Assignee: Niha
>            Priority: Major
>
> Potential security vulnerability in the logging statement within 
> {{PythonEnvUtils.java}} that may expose environment variables — including 
> Kubernetes-mounted secrets — during PyFlink job submission.
> The class 
> [{{org.apache.flink.client.python.PythonEnvUtils}}|https://github.com/apache/flink/blob/master/flink-python/src/main/java/org/apache/flink/client/python/PythonEnvUtils.java#L372-L377]
>  logs all environment variables at job startup with the following line:
>  
> {{{}LOG.info("Starting Python process with environment variables: {}", 
> environment);{}}}{{{{}}{}}}
> This line is problematic because it indiscriminately logs {*}all environment 
> variables{*}, which may contain {*}sensitive credentials{*}.
> h4. *Context: Kubernetes Operator Users Are Especially at Risk*
> When Flink is deployed using the {*}Flink Kubernetes Operator{*}, secrets are 
> commonly passed into pods as *environment variables* (via Kubernetes {{env}} 
> or {{envFrom}} fields, e.g. from {{{}secretRef{}}}).
> This includes:
>  * Database credentials
>  * Cloud service keys (e.g., {{{}AWS_SECRET_ACCESS_KEY{}}})
>  * Tokens and encryption keys
>  * Custom user-defined secrets
> Logging these secrets in plain text within the Flink JobManager or 
> TaskManager logs violates Kubernetes security best practices, which 
> explicitly discourage exposing sensitive environment variables in logs, and 
> poses a serious risk in production environments.
> h4. *Proposed Fix*
>  * Redact known sensitive keys ({{{}SECRET{}}}, {{{}TOKEN{}}}, {{{}KEY{}}}, 
> {{{}PASSWORD{}}}, etc.) before logging.
> Example fix snippet:
> Map<String, String> redactedEnv = redactSensitive(environment);
> LOG.info("Starting Python process with environment variables: {}", 
> redactedEnv);}}
>  * Consider an opt-in mechanism (e.g., {{{}log.python.env=true{}}}) for full 
> environment visibility in safe/test setups.
> h4. *Steps to Reproduce*
>  # Set Kubernetes secrets as environment variables in a FlinkDeployment 
> (e.g., via {{{}envFrom.secretRef{}}}).
>  # Launch a PyFlink job using the Flink Kubernetes Operator.
>  # Examine the JobManager logs.
>  # Observe secrets printed via {{{}PythonEnvUtils.java{}}}.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to