vaquar khan created SPARK-56040:
-----------------------------------
Summary: Spark Automated Integrity Validation (AIV) Gate
Key: SPARK-56040
URL: https://issues.apache.org/jira/browse/SPARK-56040
Project: Spark
Issue Type: Improvement
Components: Build, Project Infra
Affects Versions: 4.1.1
Reporter: vaquar khan
*Background / Motivation:* The open-source ecosystem is facing an unprecedented
surge in low-quality, automated pull requests ("AI slop"). With the massive
contribution volume Spark handles, relying purely on PR template checkboxes
(soft controls) is becoming unsustainable for maintainer bandwidth. Spark needs
a deterministic "hard control" to catch structurally flawed submissions before
they reach a human reviewer.
*Proposed Solution:* We propose adding an Automated Integrity Validation (AIV)
Gate to Spark's CI pipeline. This tool will perform deterministic, AST-based
build validation to catch the two most damaging categories of low-quality
contributions:
- Scaffolding-heavy PRs with no real logic (boilerplate inflation) via Logic
Density Ratio (LDR) validation.
- Code that violates Spark's specific architectural rules (domain-specific
anti-patterns) via a declarative YAML-based design compliance checker.
*Technical Implementation*
Written in Python to scale existing precedents (like
dev/structured_logging_style.py).
Utilizes tree-sitter-scala and jAST for robust, source-level AST parsing.
Runs entirely locally within the existing .github/workflows/build_and_test.yml
lint job, requiring zero external dependencies or APIs to prevent supply-chain
attacks.
Includes a secure, GPG-signed /aiv skip committer bypass mechanism to ensure
maintainers are never blocked during release freezes.
*Rollout Plan* The plugin architecture is modular. We propose an initial
deployment in a non-blocking "Shadow Mode" to collect baseline data, calibrate
LDR thresholds, and ensure zero disruption to current contributor workflows.
*References*
SPIP Document:
https://docs.google.com/document/d/1-PCSq0PT_B45MbXVxkJ_E3GUHvK-8VV6WxQjKSGEh9o/edit?usp=sharing
dev@ Mailing List Discussion:
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]