Re: Call for GSoC and Outreachy project ideas for summer 2023

Warner Losh Wed, 08 Feb 2023 15:03:15 -0800

On Fri, Jan 27, 2023 at 3:02 PM Stefan Hajnoczi <stefa...@gmail.com> wrote:


> On Fri, 27 Jan 2023 at 12:10, Warner Losh <i...@bsdimp.com> wrote:
> >
> > [[ cc list trimmed to just qemu-devel ]]
> >
> > On Fri, Jan 27, 2023 at 8:18 AM Stefan Hajnoczi <stefa...@gmail.com>
> wrote:
> >>
> >> Dear QEMU, KVM, and rust-vmm communities,
> >> QEMU will apply for Google Summer of Code 2023
> >> (https://summerofcode.withgoogle.com/) and has been accepted into
> >> Outreachy May 2023 (https://www.outreachy.org/). You can now
> >> submit internship project ideas for QEMU, KVM, and rust-vmm!
> >>
> >> Please reply to this email by February 6th with your project ideas.
> >>
> >> If you have experience contributing to QEMU, KVM, or rust-vmm you can
> >> be a mentor. Mentors support interns as they work on their project.
> It's a
> >> great way to give back and you get to work with people who are just
> >> starting out in open source.
> >>
> >> Good project ideas are suitable for remote work by a competent
> >> programmer who is not yet familiar with the codebase. In
> >> addition, they are:
> >> - Well-defined - the scope is clear
> >> - Self-contained - there are few dependencies
> >> - Uncontroversial - they are acceptable to the community
> >> - Incremental - they produce deliverables along the way
> >>
> >> Feel free to post ideas even if you are unable to mentor the project.
> >> It doesn't hurt to share the idea!
> >
> >
> > I've been a GSoC mentor for the FreeBSD project on and off for maybe
> > 10-15 years now. I thought I'd share this for feedback here.
> >
> > My project idea falls between the two projects. I've been trying
> > to get bsd-user reviewed and upstreamed for some time now and my
> > time available to do the upstreaming has been greatly diminished lately.
> > It got me thinking: upstreaming is more than just getting patches
> reviewed
> > often times. While there is a rather mechanical aspect to it (and I
> could likely
> > automate that aspect more), the real value of going through the review
> process
> > is that it points out things that had been done wrong, things that need
> to be
> > redone or refactored, etc. It's often these suggestions that lead to the
> biggest
> > investment of time on my part: Is this idea good? if I do it, does it
> break things?
> > Is the feedback right about what's wrong, but wrong about how to fix it?
> etc.
> > Plus the inevitable, I thought this was a good idea, implemented it only
> to find
> > it broke other things, and how do I explain that and provide feedback to
> the
> > reviewer about that breakage to see if it is worth pursuing further or
> not?
> >
> > So my idea for a project is two fold: First, to create scripts to
> automate the
> > upstreaming process: to break big files into bite-sized chunks for
> review on
> > this list. git publish does a great job from there. The current backlog
> to upstream
> > is approximately " 175 files changed, 30270 insertions(+), 640
> deletions(-)" which
> > is 300-600 patches at the 50-100 line patch guidance I've been given. So
> even
> > at .1hr (6 minutes) per patch (which is about 3x faster than I can do it
> by hand),
> > that's ~60 hours just to create the patches. Writing automation should
> take
> > much less time. Realistically, this is on the order of 10-20 hours to
> get done.
> >
> > Second, it's to take feedback from the reviews for refactoring
> > the bsd-user code base (which will eventually land in upstream). I often
> spend
> > a few hours creating my patches each quarter, then about 10 or so hours
> for the
> > 30ish patches that I do processing the review feedback by refactoring
> other things
> > (typically other architectures), checking details of other architectures
> (usually by
> > looking at the FreeBSD kernel), or looking for ways to refactor to share
> code with
> > linux-user  (though so far only the safe signals is upstream: elf could
> be too), or
> > chatting online about the feedback to better understand it, to see what
> I can mine
> > from linux-user (since the code is derived from that, but didn't pick up
> all the changes
> > linus-user has), etc. This would be on the order of 100 hours.
> >
> > Third, the testing infrastructure that exists for linux-user is not well
> leveraged to test
> > bsd-user. I've done some tests from time to time with it, but it's not
> in a state that it
> > can be used as, say, part of a CI pipeline. In addition, the FreeBSD
> project has some
> > very large jobs, a subset of which could be used to further ensure that
> critical bits of
> > infrastructure don't break (or are working if not in a CI pipeline).
> Things like building
> > and using go, rust and the like are constantly breaking for reasons too
> long to enumerate
> > here. This job could be as little as 50 hours to do a minimal but
> complete enough for CI job,
> > or as much as 200 hours to do a more complete jobs that could be used to
> bisect breakage
> > more quickly and give good assurance that at any given time bsd-user is
> useful and working.
> >
> > That's in addition to growing the number of people that can work on this
> code and
> > on the *-user code in general since they are quite similar.
> >
> > Some of these tasks are squarely in the qemu-realm, while others are in
> the FreeBSD realm,
> > but that's similar to linux-user which requires very heavy interfacing
> with the linux realm. It's
> > just that a lot of that work is already complete so the needs are
> substantially less there on an
> > ongoing basis. Since it does stratal the two projects, I'm unsure where
> to propose this project
> > be housed. But since this is a call for ideas, I thought I'd float it to
> see what the feedback is. I'm
> > happy to write this up in a more formal sense if it would be seriously
> considered, but want to get
> > feedback as to what areas I might want to emphasize in such a proposal.
> >
> > Comments?
>
> Hi Warner,
> Don't worry about it spanning FreeBSD and QEMU, you're welcome to list
> the project idea through QEMU. You can have co-mentors that are not
> part of the QEMU community in order to bring in additional FreeBSD
> expertise.
>
> My main thought is that getting all code upstream sounds like a
> sprawling project that likely won't be finished within one internship.
> Can you pick just a subset of what you described? It should be a
> well-defined project that depends minimally on other people finishing
> stuff or reaching agreement on something controversial? That way the
> intern will be able to come up with specific tasks for their project
> plan and there is little risk that they can't complete them due to
> outside factors.
>

I like this notion of limiting the  scope. There's three or maybe four main
areas
that I can call out. I got to thinking about all the details I have to do
for how
I've been upstreaming things, and realized that there's a lot due to the
complicated
history here...

One way to go about this might be for you to define a milestone that
> involves completing, testing, and upstreaming just a subset of the
> out-of-tree code. For example, it might implement a limited set of
> core syscall families. The intern will then focus on delivering that
> instead of worrying about the daunting task of getting everything
> merged. Finishing this subset would advance bsd-user FreeBSD support
> by a useful degree (e.g. ability to run certain applications).
>
> Does that sound good?
>

Yes. I like this, but it's hard to know what that might be because many
things are
hidden behind the scenes... But I'll try running a quick build to see if I
can gather
enough stats to come up with a good set of tests... But maybe I'll start
with building
'hello world' with clang on armv7 running on an amd64 host to see what's
missing
today. I also have an aarch64 set of patches I might try hard to get in
ASAP so that
might be the target instead (since it might be a bit more useful).

Warner


> Stefan
>

Re: Call for GSoC and Outreachy project ideas for summer 2023

Reply via email to