Hi, > * Share the GitHub Discussions link for the HooksCollector proposal.
You can go to https://github.com/apache/cloudberry/discussions/categories/ideas-feature-requests and click the “New discussion” button to create a proposal. Best, Dianjin Wang On Fri, Jan 30, 2026 at 6:24 PM Dianjin Wang <[email protected]> wrote: > > Hi all, > > Thanks for joining the first community meeting. Below is the meeting > recap generated by AI and lightly edited by me for clarity. Please > take it as a reference. > > - Meeting Notes: > https://docs.google.com/document/d/14NLYVvApvijsQDt7uCKblVPKhayJSxb6na9dMAp5NAM/edit?usp=sharing > - Meeting recording: > https://fathom.video/share/xRtnrNXVr1P_1X2kQZ96nKRaPDEWSGCc (I will > upload the recording to ASF Cloudberry Youtube Channel later.) > > ~~~~~ > > # Meeting Purpose > > Kick off the first bi-weekly community meeting to align on progress > and priorities. > > # Key Takeaways > > * PRs Blocked by Architectural Mismatch: Key PRs implementing > Postgres-style features (e.g., parallel append) are stalled. They > conflict with Cloudberry's MPVV-style execution model, which requires > pre-launching workers, unlike Postgres's dynamic approach. > > * PXF Roadmap Defined: The PXF roadmap has three stages: 1) sync with > upstream Greenplum PXF, 2) integrate with the latest kernel (e.g., > parallel foreign table scans), and 3) add pushdown capabilities > (aggregation, join). > > * New Extensions Proposed: Two new extensions were proposed: > HooksCollector for performance monitoring and yezzey for S3 archiving > of append-only tables to reduce storage costs. > > * Release 2.1: Release 2.1 is code-complete but blocked on testing and > documentation. The new binary swap feature is confirmed working, > enabling zero-downtime upgrades. > > # Topics > > 1. Main Repo & PR Review > > 1.1 Stalled PRs: A review of old, stalled PRs revealed a core > architectural conflict. > * Conflict: Postgres-style features (e.g., parallel append) rely on > dynamic worker launching, which clashes with CloudBerry's MPV-style > model of pre-launching workers before dispatching plans. > * Action: Community reviews and feedback are encouraged to help find a > solution. > > 1.2 Dianjin's PRs need more reviews > > 2. Ecosystem Extensions > > 2.1 PXF (Parallel eXecution Framework) > * Status: Code synced with upstream Greenplum PXF; source cleanup is > in progress. > * Roadmap: > - Sync: Catch up with the upstream Greenplum PXF branch. > - Integrate: Leverage the latest kernel's capabilities (e.g., > parallel foreign table scans) via the pxf_fdw framework. > - Pushdown: Add support for remote aggregation and join pushdown. > - Blocker: Orca does not currently support foreign data wrappers > (FDWs), which PXF uses. This must be addressed for full integration. > > Warning: PXF's FDW implementation is not production-ready; VMware > recommends it only in PXF 7.1. > > 2.2 Wal-g (Backup & Restore) > * Status: No active development. > * Gap: Untested with Pax storage, risking backup/restore failures. > * Limitation: Does not support incremental backups for PAX tables due > to their unique metadata. > * Action: Max will provide PAX documentation to help the team > understand its mechanics for Valg integration. > > 2.3 HooksCollector (Performance Monitoring) > * Proposal: Open source the data-gathering component of Greenplum 6's > Command Center. > * Function: Collects query performance data via hooks and sends it > externally via protobuf. > * Goal: Attract community contributions and feedback. > * Action: Dianjin will share the link for creating a formal proposal > in GitHub Discussions. > > 2.4 Yezzey (S3 Archiving) > * Proposal: An extension to upload/download append-only table data to/from S3. > * Rationale: To reduce storage costs by moving cold data to cheaper > object storage. > * Action: Leonid will post the idea to the dev mailing list for public > discussion. > > 2.5. Release & Governance > Release 2.1: > * Status: Code-complete on the Release 2 branch. > * Blockers: Requires more testing and user-facing documentation for > building from source. > * Binary Swap: The new feature is confirmed working, enabling > zero-downtime upgrades. > * Release Manager: Ed volunteered but may be unavailable. Dianjin is the > backup. > > 3. Incubation Report: Leonid and Dianjin will collaborate on drafting > the report. > > 4. Open Topics > > * 2026 Roadmap: Dianjin shared a draft roadmap on the dev mailing list > for feedback. > * Lakehouse Support: Leonid proposed adding Lakehouse support, noting > high community interest in Russia. > * Russian Documentation: Leonid's team will translate documentation to > Russian and propose hosting it on the official CloudBerry site to > create a single source of truth. > * TPC-DS Benchmarking: > - Problem: Inconsistent TPC-DS test setups between teams yield > non-comparable results, hindering effective performance tuning. > - Proposed Solution: Integrate a TPC-DS benchmark tool directly into > the database kernel (like DuckDB) for easy, standardized execution. > > # Next Steps > > - Leonid: > * Post the yezzey S3 archiving proposal to the dev mailing list. > * Post the Lakehouse support idea to the dev mailing list. > * Collaborate with Dianjin on the incubation report. > * Host the next community meeting. > > - Dianjin: > * Share the GitHub Discussions link for the HooksCollector proposal. > * Confirm Ed's availability for the Release 2.1 manager role. > * Share the 2026 roadmap draft on the dev mailing list. > * Share the Shenzhen meetup materials (translated to English). > > - Max: > * Send PAX documentation to the team to aid WAL-G integration. > > - All: > * Review stalled PRs and provide feedback. > * Discuss the TPC-DS benchmark standardization proposal on the dev > mailing list. > > Next Meeting: > - Rescheduled to February 27th to accommodate the Chinese New Year holiday. --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
