On 5/6/19 11:06 AM, Laszlo Ersek wrote:
On 05/03/19 23:41, Brian J. Johnson wrote:
On 5/2/19 2:33 PM, sean.brogan via groups.io wrote:
Brian,

I would really like to hear about the challenges your team faced and
issues that caused those solutions to be unworkable.  Project Mu has
and continues to invest a lot in testing capabilities, build
automation, and finding ways to improve quality that scale.


Our products depend on a reference BIOS tree provided to us by a major
processor vendor.  That tree includes portions of Edk2, plus numerous
proprietary additions.  Each new platform starts with a new drop of
vendor code.  They provide additional drops throughout the platform's
life.  In the past these were distributed as zip files, but more
recently they have transitioned to git.  We end up having to make
extensive changes to their code to port it to our platform.  In
addition, we maintain internally several packages of code used on all
our platforms, designed to be platform-independent, plus a
platform-dependent package which is intended to be modified for each
platform.

When we first started using git, we looked for a way to share our
all-platform code among platforms, and move our platform-dependent code
easily to new platforms, while making it easy to integrate new drops
from our vendor.  We considered using git submodules, but decided that
would be too awkward.  Modifying code in a submodule involves committing
in the submodule, then committing in the module containing it.  This
seemed like too much trouble for our developers, who were all new to
git.  Plus, it didn't interact well at all with our internal bug
tracking system.  Basically, there was no good way to tie commits in
various sub- and super-modules together in a straightforward, trackable
way.

We tried a package called gitslave 
(https://urldefense.proofpoint.com/v2/url?u=http-3A__gitslave.sourceforge.net_&d=DwIFaQ&c=C5b8zRQO1miGmBeVZ2LFWg&r=joEypYTP_0CJDmGFXzPM2s0mxEmiZkE9j8XY2t0muB0&m=1tiBKTNUl1hsutcV6QO4vfS5z-mNbJG27saNg6g5oxE&s=_kECBP00BbccSKeE1CThEYHF7EtrPa7XGIRfRUPq8i0&e=),
which automates running git commands across a super-repo and various
sub-repos, with some sugar for aggregating the results into a readable
whole.  It's a bit more transparent than submodules.  But at the end of
the day, you're still trying to coordinate multiple git repositories. We
gave it a try for a month or two, but having to manage multiple
repositories for day-to-day work, and the lack of a single commit
history spanning the entire tree doomed that scheme.  Developers rebelled.

Ever since, we've used a single git repo per platform.  We keep the
vendor code in a "base" branch, which we update as they provide drops,
then merge into our master branch.  When we start a new platform, we use
git filter-branch to extract our all-platform and platform-dependent
code into a new branch, which we move to the new platform's repo and
merge into master.  It's possible to re-extract the code if we need to
pick up updates.  This doesn't provide total flexibility... for
instance, backporting a fix in our all-platform code back to a previous
platform involves manual cherrypicking.

Good point -- and cherry-picking is a first class citizen in the git
toolset. Upstream projects use it all the time, between their master and
stable branches. And we (RH) happen to use it all the time too. "git
cherry-pick -s -x" (possibly "-e" too) is the main tool for backporting
upstream patches to downstream branches.

But for day-to-day development,
it lets us work in a single git tree, with a bisectable history, working
git-blame, commit IDs which tie directly to our bug tracker, and no
external tooling.  It's a bit of a pain to merge a new drop (shell
scripts are our friends), but we're optimizing for ease of local
development.  That seems like the best use of our resources.

So I'm leery of any scheme which involves multiple repos managed by an
external tool.  It sounds like a difficult way to do day-to-day
development.  If Edk2 does move to split repos, we could filter-branch
and merge them all together into a single branch for internal use, I
suppose.  But that does make it harder to push fixes upstream.

Even if that re-merging worked in practica, and even if two consumers of
edk2 followed the exact same procedure for re-unifying the repo, they
would still end up with different commit hashes -- and that would make
it more difficult to reference the same commits in upstream discussion.


Yes, we end up having to cherry pick (or more likely, outright port) any changes we want to send upstream back onto the upstream branch(es). One reason we don't do a lot of that....

(Not that we end up doing a lot of that... we're not developing an
open-source BIOS, just making use of open-source upstream components. So
our use case is quite a bit different from Laszlo's.)  We're also
generally focusing on one platform at a time, not trying to update
shared code across many at once.  So our use case may be different from
Sean's.

This got rather long... I hope it helps explain where we're coming from.

It's very educational to me -- I don't have to deal with "ZIP drops"
from vendors, and I'm impressed by the "commit vendor drop on side
branch, merge into master separately" workflow.

How difficult have your git-merges been? (You mention shell scripts.)
Have you found a correlation between merge difficulty and vendor drop
frequency? (I'd expect the less frequently new code is dropped, the
harder the merge is.)


In general, yes, the less frequently code is dropped, the greater the merge effort, and the greater the likelihood of merge mistakes. Our vendor has begun releasing much more frequently than they used to, which is generally a good thing. But there tends to be a minimum level of effort required for any drop, so if the drops are very frequent, we end up with someone doing merges pretty much full time.

One project I'm working on involves four separate upstream repos, which require individual filter-branch scripts to extract and reorganize code into staging repos, plus an additional script to pull all the results together into the final base branch. Then we can merge that to our master. Sigh... git isn't supposed to be this complicated. But at least it gives us the machinery to do what we need to. And most of our developers don't need to worry about all the merge hassles.

At RH, we generally rebase our product branches on new upstream fork-off
points (typically stable releases), instead of merging. (And, this
strategy applies to more projects than just edk2.)

Downstream, we don't create merge commits -- the downstream branches
(consisting of a handful of downstream-only commits, and a large number
of upstream backports, as time passes) have a linear history. The
"web-like" git history is inherited from upstream up to the new fork-off
point (= an upstream stable tag). The linear nature of the downstream
branches is very suitable for "RPM", where you have a base tarball (a
flat source tree generated at the upstream tag), plus a list of
downstream patches that can be applied in strict (linear) sequence, for
binary package building.


Unfortunately, our downstreams end up with many (probably thousands, I haven't counted) changes to the base code, not even counting the new code we add. So rebasing isn't an attractive option for us, and a patch-based development process just isn't feasible.

I guess the takeaway is that Edk2 is used in many ways by many different people. So it's good to keep everyone in the discussion.

Thanks!
Laszlo


--
Brian J. Johnson
Enterprise X86 Lab

Hewlett Packard Enterprise

brian.john...@hpe.com


-=-=-=-=-=-=-=-=-=-=-=-
Groups.io Links: You receive all messages sent to this group.

View/Reply Online (#40127): https://edk2.groups.io/g/devel/message/40127
Mute This Topic: https://groups.io/mt/31242794/21656
Group Owner: devel+ow...@edk2.groups.io
Unsubscribe: https://edk2.groups.io/g/devel/unsub  [arch...@mail-archive.com]
-=-=-=-=-=-=-=-=-=-=-=-

Reply via email to