Arg... accidental send before ready. What do think about the statement below for community health? Does it fairly capture the concerns/perspective?
On Thu, Oct 10, 2019 at 10:24 AM Jacques Nadeau <jacq...@apache.org> wrote: > Many contributors are struggling with the slowness of pre-commit CI. Arrow > has a large number of different platforms and components and a complex > build matrix. As new commits come in, they frequently take a long time to > complete. The community is trying several ways to solve this. Some of those > have been: > > - Try to use CircleCI, rejected in INFRA-15964 > <https://issues.apache.org/jira/browse/INFRA-15964> > - Try to use Azure Pipelines, rejected in INFRA-17030 > - Try to resolves Issues with Travis CI capacity: INFRA-18533 > <https://issues.apache.org/jira/browse/INFRA-18533>, > https://s.apache.org/ci-capacity (no resolution beyond "find > donations") > - The creation of new infrastructure design (in progress but a huge > amount of thankless work) > > > There is bubbling frustration in the community around the GitHub repo > rules for using third party services. This is especially challenging when > there are free solutions to relieve the community pressure but the > community is unable to access these resources. This frustration is greatest > among people who work on projects on many OSS projects which don't have > such restrictive rules around GitHub. > > On Thu, Oct 10, 2019 at 5:36 AM Wes McKinney <wesmck...@gmail.com> wrote: > >> Here is a rejection of CircleCI more than 18 months ago >> >> https://issues.apache.org/jira/browse/INFRA-15964 >> >> On Thu, Oct 10, 2019 at 4:33 AM Antoine Pitrou <anto...@python.org> >> wrote: >> > >> > >> > For the record, here is the ticket for Azure Pipelines integration: >> > https://issues.apache.org/jira/browse/INFRA-17030 >> > >> > I opened an issue back in May about the Travis-CI capacity situation: >> > https://issues.apache.org/jira/browse/INFRA-18533 >> > >> > Apparently CI capacity has been a "hot topic as of late": >> > >> https://lists.apache.org/thread.html/af52e2a3e865c01596d46374e8b294f2740587dbd59d85e132429b6c@%3Cbuilds.apache.org%3E >> > >> > (I didn't know this list -- bui...@apache.org -- existed, by the way) >> > >> > Regards >> > >> > Antoine. >> > >> > >> > Le 10/10/2019 à 07:34, Wes McKinney a écrit : >> > > On Thu, Oct 10, 2019 at 12:22 AM Jacques Nadeau <jacq...@apache.org> >> wrote: >> > >> >> > >> I'm not dismissing the there are issues but I also don't feel like >> there >> > >> has been constant discussion for months on the list that INFRA is >> not being >> > >> responsive to Arrow community requests. It seems like you might be >> saying a >> > >> couple different things one of two things (or both?)? >> > >> >> > >> 1) The Arrow infrastructure requirements are vastly different than >> other >> > >> projects. Because of Arrow's specialized requirements, we need >> things that >> > >> no other project needs. >> > >> 2) There are many projects that want CircleCI, Buildkite and Azure >> > >> pipelines but Infrastructure is not responsive. This is putting a big >> > >> damper on the success of the Arrow project. >> > > >> > > Yes, I'm saying both of these things. >> > > >> > > 1. Yes, Arrow is special -- validating the project requires running a >> > > dozen or more different builds (with dozens more nightly builds) that >> > > test different parts of the project. Different language components, a >> > > large and diverse packaging matrix, and interproject integration tests >> > > and integration with external projects (e.g. Apache Spark adn others) >> > > >> > > 2. Yes, the limited GitHub App availability is hurting us. >> > > >> > > I'm OK to place this concern in the "Community Health" section and >> > > spend more time building a comprehensive case about how Infra's >> > > conservatism around Apps is causing us to work with one hand tied >> > > behind our back. I know that I'm not the only one who is unhappy, but >> > > I'll let the others speak for themselves. >> > > >> > >> For each of these, if we're asking the board to do something, we >> should say >> > >> more and more clearly. Sure, CI is a pain in the Arrow project's >> a**. I >> > >> also agree that community health is impacted by the challenge to >> merge >> > >> things. I also share the perspective that the foundation has been >> slow to >> > >> adopt new technologies and has been way to religious about svn. >> However, If >> > >> we're asking the board to do something, what is it? >> > > >> > > Allow GitHub Apps that do not require write access to the code itself, >> > > set up appropriate checks and balances to ensure that the Foundation's >> > > IP provenance webhooks are preserved. >> > > >> > >> Looking at the two things you might be saying... >> > >> If 1, are we confident in that? Many other projects have pretty >> complex >> > >> build matrices I think. (I haven't thought about this and evaluated >> the >> > >> other projects...maybe it is true.) If 1, we should clarify why we >> think >> > >> we're different. If that is the case, what are asking for from the >> board. >> > >> >> > >> If 2, and you are proposing throwing stones at INFRA, we should back >> it up >> > >> with INFRA tickets and numbers (e.g. how many projects have wanted >> these >> > >> things and for how long). We should reference multiple threads on >> the INFRA >> > >> mailing list where we voiced certain concerns and many other people >> voiced >> > >> similar concerns and INFRA turned a deaf ear or blind eye (maybe >> these >> > >> exist, I haven't spent much time on the INFRA list lately). As it >> stands, >> > >> the one ticket referenced in this thread is a ticket that has only >> one >> > >> project asking for a new integration that has been open for less >> than a >> > >> week. That may be annoying but it doesn't seem like something that >> has >> > >> gotten to the level that we need to get the boards help. >> > >> >> > >> In a nutshell, I agree that this is impacting the health and growth >> of the >> > >> project but think we should cover that in the community health >> section of >> > >> the report. I'm less a fan of saying this is an issue the board >> needs to >> > >> help us solve unless it has been a constant point of pain that we've >> > >> attempted to elevate multiple times in infra forums and experienced >> > >> unreasonable responses. The board is a blunt instrument and should >> only be >> > >> used when we have depleted every other avenue for resolution. >> > >> >> > > >> > > Yes, I'm happy to spend more time building a comprehensive case before >> > > escalating it to the board level. However, Apache Arrow is a high >> > > profile project and it is not a good luck to have a PMC in a >> > > fast-growing project growing disgruntled with the Foundation's >> > > policies in this way. We've been struggling visibly for a long time >> > > with our CI scalability, and I think we should have all the options on >> > > the table to utilize GitHub-integrated tools to help us find a way out >> > > of the mess that we are in. >> > > >> > >> >> > >> On Wed, Oct 9, 2019 at 9:44 PM Wes McKinney <wesmck...@gmail.com> >> wrote: >> > >> >> > >>> hi Jacques, >> > >>> >> > >>> I think we need to share the concerns that many PMC members have >> over >> > >>> the constraints that INFRA is placing on us. Can we rephrase the >> > >>> concern in a way that is more helpful? >> > >>> >> > >>> Firstly, I respect and appreciate the ASF's desire to limit write >> > >>> access to committers only from an IP provenance perspective. I >> > >>> understand that GitHub webhooks are used to log actions taken in >> > >>> repositories to secure IP provenance. I do not think a third party >> > >>> application should be given the ability to commit or modify a >> > >>> repository -- all write operations on the .git repository should be >> > >>> initiated by committers. >> > >>> >> > >>> However, GitHub is the main platform for producing open source >> > >>> software, and tools are being created to help produce open source >> more >> > >>> efficiently. It is frustrating for us to not be able to take >> advantage >> > >>> of the tools that are available to everyone else on GitHub. I >> brought >> > >>> up the recent request about Buildkite as being representative of >> this >> > >>> (after learning that Google has been making a lot of use of it), but >> > >>> we have previously been denied use of CircleCI and Azure Pipelines >> > >>> since those services require even more permissions (AFAIK) than in >> the >> > >>> case of Buildkite. From our use in >> > >>> https://github.com/ursa-labs/crossbow CircleCI and Azure seem to >> be a >> > >>> lot better than Travis CI and Appveyor >> > >>> >> > >>> I think the ASF is going to face an existential crisis in the near >> > >>> future whether it wants to live in 2020 or 2000. It feels like >> GitHub >> > >>> is treated somewhat as ersatz SVN "because people want to use git + >> > >>> GitHub instead of SVN" >> > >>> >> > >>> In the same way that the cloud revolutionized software startups, >> > >>> enabling small groups of developers to build large SaaS >> applications, >> > >>> the same kind of leverage is becoming available to open source >> > >>> developers to set up infrastructure to automate and scale open >> source >> > >>> projects. I think projects considering joining the Foundation are >> > >>> going to look at these issues around App usage and decide that they >> > >>> would rather be in control of their own infrastructure. >> > >>> >> > >>> I can set aside even more time and money from my non-profit >> > >>> organization's modest budget to do CI work for Apache Arrow. The >> > >>> amount that we have invested already is very large, and continues to >> > >>> grow. I'm raising these issues because as Member of the Foundation >> I'm >> > >>> concerned that fast-growing projects like ours are not being >> > >>> adequately served by INFRA, and we probably aren't the only project >> > >>> that will face these issues. All that is needed is for INFRA to let >> us >> > >>> use third party GitHub Apps and monitor any potentially destructive >> > >>> actions that they may take, such as modifying unrelated repository >> > >>> webhooks related to IP provenance. >> > >>> >> > >>> - Wes >> > >>> >> > >>> On Wed, Oct 9, 2019 at 9:33 PM Jacques Nadeau <jacq...@apache.org> >> wrote: >> > >>>> >> > >>>> I think we need to more direct in listing issues for the board. >> > >>>> >> > >>>> What have we done? What do we want them to do? >> > >>>> >> > >>>> In general, any large org is going to be slow to add new deep >> > >>> integrations >> > >>>> into GitHub. I don't think we should expect Apache to be any >> different >> > >>> (it >> > >>>> took several years before we could merge things through github for >> > >>>> example). If I were on the INFRA side, I think I would look and >> see how >> > >>>> many different people are asking for BuildKite before considering >> > >>>> integration. It seems like we only opened the JIRA 6 days ago and >> no >> > >>> other >> > >>>> projects have requested access to this? >> > >>>> >> > >>>> I'm not clear why this is a board issue. What do we think the >> board can >> > >>> do >> > >>>> for us that we can't solve ourselves and need them to solve? >> Remember, a >> > >>>> board solution to a problem is typically very removed from what >> matters >> > >>> to >> > >>>> individuals on a project. >> > >>>> >> > >>>> >> > >>>> >> > >>>> >> > >>>> >> > >>>> >> > >>>> On Tue, Oct 8, 2019 at 7:03 AM Wes McKinney <wesmck...@gmail.com> >> wrote: >> > >>>> >> > >>>>> New draft >> > >>>>> >> > >>>>> ## Description: >> > >>>>> The mission of Apache Arrow is the creation and maintenance of >> software >> > >>>>> related >> > >>>>> to columnar in-memory processing and data interchange >> > >>>>> >> > >>>>> ## Issues: >> > >>>>> >> > >>>>> * We are struggling with Continuous Integration scalability as the >> > >>> project >> > >>>>> has >> > >>>>> definitely outgrown what Travis CI and Appveyor can do for us. >> Some >> > >>>>> contributors have shown reluctance to submit patches they >> aren't sure >> > >>>>> about >> > >>>>> because they don't want to pile on the build queue. We are >> exploring >> > >>>>> alternative solutions such as Buildbot, Buildkite, and GitHub >> > >>> Actions to >> > >>>>> provide a path to migrate away from Travis CI / Appveyor. In our >> > >>> request >> > >>>>> to >> > >>>>> Infrastructure INFRA-19217, some of us were alarmed to find >> that an >> > >>> CI/CD >> > >>>>> service like Buildkite may not be able to be connected to the >> @apache >> > >>>>> GitHub >> > >>>>> account on account of requiring admin access to repository >> webhooks, >> > >>> but >> > >>>>> no >> > >>>>> ability to modify source code. There are workarounds (building >> custom >> > >>>>> OAuth >> > >>>>> bots) that could enable us to use Buildkite, but it would >> require >> > >>> extra >> > >>>>> development and result in a less refined experience for >> community >> > >>>>> members. >> > >>>>> >> > >>>>> ## Membership Data: >> > >>>>> * Apache Arrow was founded 2016-01-19 (4 years ago) >> > >>>>> * There are currently 48 committers and 28 PMC members in this >> project. >> > >>>>> * The Committer-to-PMC ratio is roughly 3:2. >> > >>>>> >> > >>>>> Community changes, past quarter: >> > >>>>> - Micah Kornfield was added to the PMC on 2019-08-21 >> > >>>>> - Sebastien Binet was added to the PMC on 2019-08-21 >> > >>>>> - Ben Kietzman was added as committer on 2019-09-07 >> > >>>>> - David Li was added as committer on 2019-08-30 >> > >>>>> - Kenta Murata was added as committer on 2019-09-05 >> > >>>>> - Neal Richardson was added as committer on 2019-09-05 >> > >>>>> - Praveen Kumar was added as committer on 2019-07-14 >> > >>>>> >> > >>>>> ## Project Activity: >> > >>>>> >> > >>>>> * The project has just made a 0.15.0 release. >> > >>>>> * We are discussing ways to make the Arrow libraries as >> accessible as >> > >>>>> possible >> > >>>>> to downstream projects for minimal use cases while allowing the >> > >>>>> development >> > >>>>> of more comprehensive "standard libraries" with larger >> dependency >> > >>> stacks >> > >>>>> in >> > >>>>> the project >> > >>>>> * We plan to make a 1.0.0 release as our next major release, at >> which >> > >>> time >> > >>>>> we >> > >>>>> will declare that the Arrow binary protocol is stable with >> forward >> > >>> and >> > >>>>> backward compatibility guarantees >> > >>>>> >> > >>>>> ## Community Health: >> > >>>>> >> > >>>>> * The community is overall healthy, with the aforementioned >> concerns >> > >>>>> around CI >> > >>>>> scalability. New contributors frequently take notice of the long >> > >>> build >> > >>>>> queue >> > >>>>> times when submitting pull requests. >> > >>>>> >> > >>>>> On Tue, Oct 8, 2019 at 8:58 AM Wes McKinney <wesmck...@gmail.com> >> > >>> wrote: >> > >>>>>> >> > >>>>>> Yes, I agree with raising the issue to the board. >> > >>>>>> >> > >>>>>> On Tue, Oct 8, 2019 at 8:31 AM Antoine Pitrou < >> anto...@python.org> >> > >>>>> wrote: >> > >>>>>>> >> > >>>>>>> >> > >>>>>>> I agree. Especially given that the constraints imposed by Infra >> > >>> don't >> > >>>>>>> help solving the problem. >> > >>>>>>> >> > >>>>>>> Regards >> > >>>>>>> >> > >>>>>>> Antoine. >> > >>>>>>> >> > >>>>>>> >> > >>>>>>> Le 08/10/2019 à 15:02, Uwe L. Korn a écrit : >> > >>>>>>>> I'm not sure what qualifies for "board attention" but it seems >> > >>> that >> > >>>>> CI is a critical problem in Apache projects, not just Arrow. >> Should we >> > >>>>> raise that? >> > >>>>>>>> >> > >>>>>>>> Uwe >> > >>>>>>>> >> > >>>>>>>> On Tue, Oct 8, 2019, at 12:00 AM, Wes McKinney wrote: >> > >>>>>>>>> Here is a start for our Q3 board report >> > >>>>>>>>> >> > >>>>>>>>> ## Description: >> > >>>>>>>>> The mission of Apache Arrow is the creation and maintenance of >> > >>>>> software related >> > >>>>>>>>> to columnar in-memory processing and data interchange >> > >>>>>>>>> >> > >>>>>>>>> ## Issues: >> > >>>>>>>>> There are no issues requiring board attention at this time >> > >>>>>>>>> >> > >>>>>>>>> ## Membership Data: >> > >>>>>>>>> * Apache Arrow was founded 2016-01-19 (4 years ago) >> > >>>>>>>>> * There are currently 48 committers and 28 PMC members in this >> > >>>>> project. >> > >>>>>>>>> * The Committer-to-PMC ratio is roughly 3:2. >> > >>>>>>>>> >> > >>>>>>>>> Community changes, past quarter: >> > >>>>>>>>> - Micah Kornfield was added to the PMC on 2019-08-21 >> > >>>>>>>>> - Sebastien Binet was added to the PMC on 2019-08-21 >> > >>>>>>>>> - Ben Kietzman was added as committer on 2019-09-07 >> > >>>>>>>>> - David Li was added as committer on 2019-08-30 >> > >>>>>>>>> - Kenta Murata was added as committer on 2019-09-05 >> > >>>>>>>>> - Neal Richardson was added as committer on 2019-09-05 >> > >>>>>>>>> - Praveen Kumar was added as committer on 2019-07-14 >> > >>>>>>>>> >> > >>>>>>>>> ## Project Activity: >> > >>>>>>>>> >> > >>>>>>>>> * The project has just made a 0.15.0 release. >> > >>>>>>>>> * We are discussing ways to make the Arrow libraries as >> > >>> accessible >> > >>>>> as possible >> > >>>>>>>>> to downstream projects for minimal use cases while allowing >> > >>> the >> > >>>>> development >> > >>>>>>>>> of more comprehensive "standard libraries" with larger >> > >>> dependency >> > >>>>> stacks in >> > >>>>>>>>> the project >> > >>>>>>>>> * We plan to make a 1.0.0 release as our next major release, >> at >> > >>>>> which time we >> > >>>>>>>>> will declare that the Arrow binary protocol is stable with >> > >>>>> forward and >> > >>>>>>>>> backward compatibility guarantees >> > >>>>>>>>> * We are struggling with Continuous Integration scalability as >> > >>> the >> > >>>>> project has >> > >>>>>>>>> definitely outgrown what Travis CI and Appveyor can do for >> > >>> us. We >> > >>>>> are >> > >>>>>>>>> exploring alternative solutions such as Buildbot, Buildkite >> > >>> (see >> > >>>>>>>>> INFRA-19217), and GitHub Actions to provide a path to >> migrate >> > >>>>> away from >> > >>>>>>>>> Travis CI / Appveyor >> > >>>>>>>>> >> > >>>>>>>>> ## Community Health: >> > >>>>>>>>> >> > >>>>>>>>> * The community is overall healthy, with the aforementioned >> > >>>>> concerns around CI >> > >>>>>>>>> scalability. New contributors frequently take notice of the >> > >>> long >> > >>>>> build queue >> > >>>>>>>>> times when submitting pull requests. >> > >>>>>>>>> >> > >>>>> >> > >>> >> >