satwikmishra11 commented on PR #15624: URL: https://github.com/apache/datafusion/pull/15624#issuecomment-2786251836
certainly, thank you for considering my pr @berkaysynnada # Automated Performance Benchmarking Solution for Apache DataFusion ## Objective Implement continuous performance monitoring using Conbench to: - Catch performance regressions early - Enable data-driven optimization decisions - Provide historical trend analysis --- ## Solution Architecture ### 1. Automated Benchmark Execution (GitHub Actions) `.github/workflows/benchmarks.yml`: ```yaml name: Performance Benchmarks on: push: branches: [main] pull_request: branches: [main] schedule: - cron: '0 12 * * *' # Daily runs env: CONBENCH_URL: https://datafusion-conbench.ursa.dev CONBENCH_EMAIL: datafusion-...@apache.org CONBENCH_PASSWORD: ${{ secrets.CONBENCH_PASSWORD }} jobs: benchmarks: runs-on: ubuntu-latest steps: - uses: actions/checkout@v4 - name: Set up Python uses: actions/setup-python@v4 with: python-version: '3.10' - name: Install dependencies run: | sudo apt-get install -y cmake pip install -e conbench/ - name: Run benchmarks run: | cd conbench conbench run --python-datafusion --capture=no 2. PR Integration .github/workflows/benchmark-comment.yml: yaml Copy name: Benchmark Results Comment on: workflow_run: workflows: ["Performance Benchmarks"] types: - completed jobs: comment: runs-on: ubuntu-latest steps: - uses: actions/github-script@v6 with: script: | // GitHub Script implementation // Posts comparison link to PR env: GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }} 3. Dashboard Setup conbench/docker-compose.yml: yaml Copy version: '3' services: conbench: image: conbench/conbench:latest ports: - "5000:5000" environment: - CONBENCH_DB_NAME=conbench - CONBENCH_DB_USER=postgres - CONBENCH_DB_PASSWORD=postgres - CONBENCH_DB_HOST=postgres depends_on: - postgres postgres: image: postgres:13 environment: - POSTGRES_USER=postgres - POSTGRES_PASSWORD=postgres - POSTGRES_DB=conbench 4. Benchmark Maintenance Example benchmark (conbench/benchmarks/sort.py): python Copy from conbench import Benchmark class SortBenchmark(Benchmark): name = "sort" def run(self, **kwargs): ctx = datafusion.SessionContext() # Benchmark implementation self.record( {"time": duration}, {}, output=result ) Implementation Checklist Secrets Configuration Add CONBENCH_PASSWORD in GitHub repository secrets Ensure GITHUB_TOKEN has appropriate permissions Infrastructure Requirements Dedicated runner for consistent benchmarking Conbench instance hosting (cloud/on-prem) Alert Configuration Set statistical significance threshold (p < 0.05) Configure notification channels (Slack/Email) Expected Outcomes Automated Execution PR-triggered benchmarks Daily performance snapshots Historical commit-associated data -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org For additional commands, e-mail: github-h...@datafusion.apache.org