Hi,

> * Share the GitHub Discussions link for the HooksCollector proposal.

You can go to 
https://github.com/apache/cloudberry/discussions/categories/ideas-feature-requests
and click the “New discussion” button to create a proposal.

Best,
Dianjin Wang

On Fri, Jan 30, 2026 at 6:24 PM Dianjin Wang <[email protected]> wrote:
>
> Hi all,
>
> Thanks for joining the first community meeting. Below is the meeting
> recap generated by AI and lightly edited by me for clarity. Please
> take it as a reference.
>
> - Meeting Notes:
> https://docs.google.com/document/d/14NLYVvApvijsQDt7uCKblVPKhayJSxb6na9dMAp5NAM/edit?usp=sharing
> - Meeting recording:
> https://fathom.video/share/xRtnrNXVr1P_1X2kQZ96nKRaPDEWSGCc (I will
> upload the recording to ASF Cloudberry Youtube Channel later.)
>
> ~~~~~
>
> # Meeting Purpose
>
> Kick off the first bi-weekly community meeting to align on progress
> and priorities.
>
> # Key Takeaways
>
> * PRs Blocked by Architectural Mismatch: Key PRs implementing
> Postgres-style features (e.g., parallel append) are stalled. They
> conflict with Cloudberry's MPVV-style execution model, which requires
> pre-launching workers, unlike Postgres's dynamic approach.
>
> *  PXF Roadmap Defined: The PXF roadmap has three stages: 1) sync with
> upstream Greenplum PXF, 2) integrate with the latest kernel (e.g.,
> parallel foreign table scans), and 3) add pushdown capabilities
> (aggregation, join).
>
> * New Extensions Proposed: Two new extensions were proposed:
> HooksCollector for performance monitoring and yezzey for S3 archiving
> of append-only tables to reduce storage costs.
>
> * Release 2.1: Release 2.1 is code-complete but blocked on testing and
> documentation. The new binary swap feature is confirmed working,
> enabling zero-downtime upgrades.
>
> # Topics
>
> 1. Main Repo & PR Review
>
> 1.1 Stalled PRs: A review of old, stalled PRs revealed a core
> architectural conflict.
> * Conflict: Postgres-style features (e.g., parallel append) rely on
> dynamic worker launching, which clashes with CloudBerry's MPV-style
> model of pre-launching workers before dispatching plans.
> * Action: Community reviews and feedback are encouraged to help find a 
> solution.
>
> 1.2 Dianjin's PRs need more reviews
>
> 2. Ecosystem Extensions
>
> 2.1 PXF (Parallel eXecution Framework)
> * Status: Code synced with upstream Greenplum PXF; source cleanup is
> in progress.
> * Roadmap:
>   - Sync: Catch up with the upstream Greenplum PXF branch.
>   - Integrate: Leverage the latest kernel's capabilities (e.g.,
> parallel foreign table scans) via the pxf_fdw framework.
>   - Pushdown: Add support for remote aggregation and join pushdown.
>   - Blocker: Orca does not currently support foreign data wrappers
> (FDWs), which PXF uses. This must be addressed for full integration.
>
> Warning: PXF's FDW implementation is not production-ready; VMware
> recommends it only in PXF 7.1.
>
> 2.2 Wal-g (Backup & Restore)
> * Status: No active development.
> * Gap: Untested with Pax storage, risking backup/restore failures.
> * Limitation: Does not support incremental backups for PAX tables due
> to their unique metadata.
> * Action: Max will provide PAX documentation to help the team
> understand its mechanics for Valg integration.
>
> 2.3 HooksCollector (Performance Monitoring)
> * Proposal: Open source the data-gathering component of Greenplum 6's
> Command Center.
> * Function: Collects query performance data via hooks and sends it
> externally via protobuf.
> * Goal: Attract community contributions and feedback.
> * Action: Dianjin will share the link for creating a formal proposal
> in GitHub Discussions.
>
> 2.4 Yezzey (S3 Archiving)
> * Proposal: An extension to upload/download append-only table data to/from S3.
> * Rationale: To reduce storage costs by moving cold data to cheaper
> object storage.
> * Action: Leonid will post the idea to the dev mailing list for public
> discussion.
>
> 2.5. Release & Governance
> Release 2.1:
> * Status: Code-complete on the Release 2 branch.
> * Blockers: Requires more testing and user-facing documentation for
> building from source.
> * Binary Swap: The new feature is confirmed working, enabling
> zero-downtime upgrades.
> * Release Manager: Ed volunteered but may be unavailable. Dianjin is the 
> backup.
>
> 3. Incubation Report: Leonid and Dianjin will collaborate on drafting
> the report.
>
> 4. Open Topics
>
> * 2026 Roadmap: Dianjin shared a draft roadmap on the dev mailing list
> for feedback.
> * Lakehouse Support: Leonid proposed adding Lakehouse support, noting
> high community interest in Russia.
> * Russian Documentation: Leonid's team will translate documentation to
> Russian and propose hosting it on the official CloudBerry site to
> create a single source of truth.
> * TPC-DS Benchmarking:
>  - Problem: Inconsistent TPC-DS test setups between teams yield
> non-comparable results, hindering effective performance tuning.
>  - Proposed Solution: Integrate a TPC-DS benchmark tool directly into
> the database kernel (like DuckDB) for easy, standardized execution.
>
> # Next Steps
>
> - Leonid:
>  * Post the yezzey S3 archiving proposal to the dev mailing list.
>  * Post the Lakehouse support idea to the dev mailing list.
>  * Collaborate with Dianjin on the incubation report.
>  * Host the next community meeting.
>
> - Dianjin:
> * Share the GitHub Discussions link for the HooksCollector proposal.
> * Confirm Ed's availability for the Release 2.1 manager role.
> * Share the 2026 roadmap draft on the dev mailing list.
> * Share the Shenzhen meetup materials (translated to English).
>
> - Max:
>  * Send PAX documentation to the team to aid WAL-G integration.
>
> - All:
>  * Review stalled PRs and provide feedback.
>  * Discuss the TPC-DS benchmark standardization proposal on the dev
> mailing list.
>
> Next Meeting:
> - Rescheduled to February 27th to accommodate the Chinese New Year holiday.

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to