Niha created FLINK-38035: ---------------------------- Summary: Security Vulnerability in PyFlink Logging Mechanism (PythonEnvUtils.java) Key: FLINK-38035 URL: https://issues.apache.org/jira/browse/FLINK-38035 Project: Flink Issue Type: Bug Components: API / Python Affects Versions: 1.20.1, 1.19.1 Reporter: Niha
Potential security vulnerability in the logging statement within {{PythonEnvUtils.java}} that may expose environment variables — including Kubernetes-mounted secrets — during PyFlink job submission. The class [{{org.apache.flink.client.python.PythonEnvUtils}}|https://github.com/apache/flink/blob/master/flink-python/src/main/java/org/apache/flink/client/python/PythonEnvUtils.java#L372-L377] logs all environment variables at job startup with the following line: {{{}LOG.info("Starting Python process with environment variables: {}", environment);{}}}{{{}{}}} This line is problematic because it indiscriminately logs {*}all environment variables{*}, which may contain {*}sensitive credentials{*}. h4. *Context: Kubernetes Operator Users Are Especially at Risk* When Flink is deployed using the {*}Flink Kubernetes Operator{*}, secrets are commonly passed into pods as *environment variables* (via Kubernetes {{env}} or {{envFrom}} fields, e.g. from {{{}secretRef{}}}). This includes: * Database credentials * Cloud service keys (e.g., {{{}AWS_SECRET_ACCESS_KEY{}}}) * Tokens and encryption keys * Custom user-defined secrets Logging these secrets in plain text within the Flink JobManager or TaskManager logs violates Kubernetes security best practices, which explicitly discourage exposing sensitive environment variables in logs, and poses a serious risk in production environments. h4. *Proposed Fix* * Redact known sensitive keys ({{{}*_SECRET_*{}}}, {{{}*_TOKEN{}}}, {{{}*_KEY{}}}, {{{}PASSWORD{}}}, etc.) before logging. Example fix snippet: {{Map<String, String> redactedEnv = redactSensitive(environment); LOG.info("Starting Python process with environment variables: {}", redactedEnv);}} * Consider an opt-in mechanism (e.g., {{{}log.python.env=true{}}}) for full environment visibility in safe/test setups. h4. *Steps to Reproduce* # Set Kubernetes secrets as environment variables in a FlinkDeployment (e.g., via {{{}envFrom.secretRef{}}}). # Launch a PyFlink job using the Flink Kubernetes Operator. # Examine the JobManager logs. # Observe secrets printed via {{{}PythonEnvUtils.java{}}}. -- This message was sent by Atlassian Jira (v8.20.10#820010)