Hi Jiaqi,
Thanks again for explaining the reasoning behind splitting the PAX PR.
Your concerns about "merge main" are well-taken — it introduces
non-linear history, complicates git bisect, and can lead to downstream
integration issues. It's clear that the decision to split was made
carefully under release pressure, and I appreciate the open dialogue
around this.
Looking forward, I’d like to propose a Git CLI-based workflow that can
help us avoid splitting large PRs in the future — even when commit
count exceeds GitHub’s "Rebase and Merge" limit in the UI.
This approach allows us to:
- Preserve full commit history (no squash)
- Avoid splitting logically complete work
- Maintain linear history for bisectability and readability
- Cleanly integrate with downstream forks if needed
Proposed Workflow:
------------------
# 1. Rebase the feature branch onto the latest main
git checkout feature/your-branch
git fetch origin
git rebase origin/main
# 2. Push the rebased feature branch
git push --force-with-lease
# 3. After PR approval, ensure main is still current
git fetch origin
git checkout main
git pull origin main
# 4. If main has progressed, rebase the feature branch again
git checkout feature/your-branch
git rebase origin/main
git push --force-with-lease
# 5. Merge the rebased branch into main using CLI
git checkout main
git merge feature/your-branch --no-ff
git push origin main
This process:
- Avoids GitHub UI merge limitations
- Keeps the commit graph clean and linear
- Ensures CI validation is accurate and relevant
- Preserves the full contribution context
Next Steps:
-----------
If this general approach makes sense, I’d be happy to help document it
in our committer or contributor guidelines. I’d also love to hear from
others — especially those maintaining downstream forks or submitting
larger features.
Thanks again Jiaqi for leading the PAX work and for raising the
trade-offs so thoughtfully. It’s through these conversations that we
build a better process together.
Best,
-=e
On Thu, Apr 10, 2025 at 7:51 AM jiaqi.zhou <[email protected]> wrote:
> Hi all,
>
>
>
>
> My colleagues and I have internally discussed the option of using a "merge
> main" approach to bypass the "100+ commit rebase and merge problem".
>
>
>
>
> Why not "merge main"?
>
> - Non-linear History: Merging main would create a non-linear commit graph.
>
> - Impact on Git Bisect: This could complicate debugging workflows like git
> bisect.
>
> - Downstream Compatibility: Projects forked from CloudBerryDB with
> divergent codebases might face integration challenges.
>
>
>
>
> Why choose splited the PR?
>
> PAX had CI + code review internally since the project was launched, and
> every commit is complete (that is why we don’t choose squash). And after
> the split PRs are merged, the commits are linear.
>
>
>
>
> With the CBDB release approaching, please let us discuss this topic as
> soon as possible.
>
> Thanks
> Jiaqi
>
>
> 在 2025-04-10 22:01:09,"Ed Espino" <[email protected]> 写道:
> >Hi all,
> >
> >I’d like to raise a contribution workflow concern we're currently
> >encountering in Apache Cloudberry (Incubating), and propose that we
> >establish a preferred approach for handling similar situations going
> >forward.
> >
> >Contributor *@jiaqizho* submitted a significant pull request:
> >*#1002 – Feature: introduce a high-performance hybrid row-columnar storage
> >engine <https://github.com/apache/cloudberry/pull/1002>*
> >
> >The PR contains *300+ commits* and has successfully passed CI. However,
> due
> >to the number of commits, GitHub's *“Rebase and Merge”* option is disabled
> >— a known limitation when the PR size exceeds certain internal thresholds.
> >As a result, the PR cannot be merged via the web UI, even by committers
> >with full permissions.
> >
> >In response, the contributor has now *split the PR into four smaller PRs*
> >in an attempt to work around the UI limitation and proceed with merging.
> >------------------------------
> >Why This May Not Be Ideal
> >
> >While the effort is appreciated, splitting the PR introduces several
> >drawbacks:
> >
> > -
> >
> > *Review context becomes fragmented* across multiple PRs.
> > -
> >
> > *Merge complexity increases*, especially when changes are
> interdependent.
> > -
> >
> > *Contributor and reviewer effort multiplies*, with more overhead and
> > duplicated CI runs.
> > -
> >
> > *It sends a mixed message* to future contributors that PR splitting is
> > preferred in these cases — which isn’t necessarily true.
> >
> >------------------------------
> >What Other ASF Projects Do
> >
> >Several other Apache projects handle large PRs by relying on *Git
> CLI-based
> >merges*, rather than splitting:
> >
> > -
> >
> > *Apache Arrow*: Encourages local rebases and merges for large
> > contributions.
> > -
> >
> > *Apache Spark*: Merges and squashes are typically done via CLI;
> > splitting is discouraged unless changes are logically separable.
> > -
> >
> > *Apache Kafka*: Maintainers use merge scripts
> > <
> https://cwiki.apache.org/confluence/display/KAFKA/Pull+Request+Workflow>
> > to handle large PRs manually.
> > -
> >
> > *Apache Flink* and *Apache Beam*: Default to local CLI workflows to
> > maintain history and bypass UI restrictions.
> >
> >This keeps reviews cohesive and simplifies the overall process for
> >contributors and committers alike.
> >------------------------------
> >✅ Recommended Best Practice for Apache Cloudberry
> >
> >To align with ASF norms and improve maintainability, I propose:
> >
> > 1.
> >
> > *Using Git CLI-based merges* as the standard method for large PRs
> (e.g.,
> > 100+ commits or more).
> > 2.
> >
> > *Discouraging contributors from splitting PRs* to work around UI
> > limitations, unless explicitly requested by reviewers for clarity or
> > modularity.
> > 3.
> >
> > *Documenting this workflow* in our committer guidelines to ensure
> > consistency.
> >
> >------------------------------
> > Verified CLI Merge Workflow for Large PRs
> >
> ># 1. Fetch the PR branch directly from GitHub
> >git fetch origin pull/1002/head:pax-merge
> >
> ># 2. Optionally rebase for a linear history
> >git checkout pax-merge
> >git rebase origin/main
> >
> ># 3. Merge into main
> >git checkout main
> >git pull origin main
> >git merge pax-merge --no-ff
> >
> ># 4. Push the result to the repository
> >git push origin main
> >
> ># (Optional) Clean up
> >git branch -d pax-merge
> >
> >This approach avoids GitHub’s UI merge limitations, preserves commit
> >history, and maintains a better experience for both contributors and
> >reviewers.
> >------------------------------
> >
> >Would love to hear thoughts from the community. If there's agreement, we
> >should add contributing and committer workflows to our newly enabled wiki.
> >
> >Best regards,
> >-=e
> >Ed Espino
> >Apache Cloudberry (Incubating) & MADlib
>
--
Ed Espino
Apache Cloudberry (Incubating) & MADlib