satwikmishra11 commented on PR #15624:
URL: https://github.com/apache/datafusion/pull/15624#issuecomment-2786251836
certainly, thank you for considering my pr @berkaysynnada
# Automated Performance Benchmarking Solution for Apache DataFusion
## Objective
Implement continuous performance monitoring using Conbench to:
- Catch performance regressions early
- Enable data-driven optimization decisions
- Provide historical trend analysis
---
## Solution Architecture
### 1. Automated Benchmark Execution (GitHub Actions)
`.github/workflows/benchmarks.yml`:
```yaml
name: Performance Benchmarks
on:
push:
branches: [main]
pull_request:
branches: [main]
schedule:
- cron: '0 12 * * *' # Daily runs
env:
CONBENCH_URL: https://datafusion-conbench.ursa.dev
CONBENCH_EMAIL: [email protected]
CONBENCH_PASSWORD: ${{ secrets.CONBENCH_PASSWORD }}
jobs:
benchmarks:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v4
- name: Set up Python
uses: actions/setup-python@v4
with:
python-version: '3.10'
- name: Install dependencies
run: |
sudo apt-get install -y cmake
pip install -e conbench/
- name: Run benchmarks
run: |
cd conbench
conbench run --python-datafusion --capture=no
2. PR Integration
.github/workflows/benchmark-comment.yml:
yaml
Copy
name: Benchmark Results Comment
on:
workflow_run:
workflows: ["Performance Benchmarks"]
types:
- completed
jobs:
comment:
runs-on: ubuntu-latest
steps:
- uses: actions/github-script@v6
with:
script: |
// GitHub Script implementation
// Posts comparison link to PR
env:
GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }}
3. Dashboard Setup
conbench/docker-compose.yml:
yaml
Copy
version: '3'
services:
conbench:
image: conbench/conbench:latest
ports:
- "5000:5000"
environment:
- CONBENCH_DB_NAME=conbench
- CONBENCH_DB_USER=postgres
- CONBENCH_DB_PASSWORD=postgres
- CONBENCH_DB_HOST=postgres
depends_on:
- postgres
postgres:
image: postgres:13
environment:
- POSTGRES_USER=postgres
- POSTGRES_PASSWORD=postgres
- POSTGRES_DB=conbench
4. Benchmark Maintenance
Example benchmark (conbench/benchmarks/sort.py):
python
Copy
from conbench import Benchmark
class SortBenchmark(Benchmark):
name = "sort"
def run(self, **kwargs):
ctx = datafusion.SessionContext()
# Benchmark implementation
self.record(
{"time": duration},
{},
output=result
)
Implementation Checklist
Secrets Configuration
Add CONBENCH_PASSWORD in GitHub repository secrets
Ensure GITHUB_TOKEN has appropriate permissions
Infrastructure Requirements
Dedicated runner for consistent benchmarking
Conbench instance hosting (cloud/on-prem)
Alert Configuration
Set statistical significance threshold (p < 0.05)
Configure notification channels (Slack/Email)
Expected Outcomes
Automated Execution
PR-triggered benchmarks
Daily performance snapshots
Historical commit-associated data
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]