Thanks for the responses, Jon and Carlo! It makes sense to me to prevent future patches from re-introducing the terminology. I can file a JIRA to add the +1/-1 functionality to the precommit builds.
As for splitting up the work, I think it'll probably be easiest and cleanest to have an umbrella for each subproject of Hadoop (Hadoop, HDFS, YARN, Mapreduce) with smaller tasks (e.g. whitelist/blacklist, master/slave) as subtasks of each umbrella. That way each expert can chime in on their relative land of expertise and the patches won't be gigantic. I can then link the umbrella JIRAs together so everything can be found easily. As Carlo pointed out, it's unclear whether fewer, but larger patches is better or worse than more, smaller patches. But I think that at least for the sake of manageability and getting this into Apache, smaller patches is likely easier. Eric On Thu, Jul 30, 2020 at 5:50 PM Carlo Aldo Curino <carlo.cur...@gmail.com> wrote: > Thanks again Eric for leading the charge. As for whether to chop it up or > keep it in fewer patches, I think it primarily impact the conflict surface > with dev branches and other in-flight development. More patches are likely > creating more localized clashes (as in I clash with a smaller patch, which > might be less daunting, though potentially more of them to deal with). I > don't have a strong preference, maybe chunking it into reasonable packages, > so that you can involve the right core group of committers to way in for > each sub-area. > > Thanks, > Carlo > > > > On Thu, Jul 30, 2020 at 1:20 PM Jonathan Eagles <jeag...@gmail.com> wrote: > > > Thanks, Eric. I like this proposal and I'm glad this work is getting > > traction. A few thoughts on implementation. > > > > Once the fix is done, I think it will be necessary to ensure these > > language restrictions are enforced at the patch level. This will +1/-1 > > patches that introduce terminology that violate our policy. > > > > As to splitting up the patches, it may be necessary to to split these up > > further in cases where feature experts need to weigh in on compatibility > > (usually with regards to persistence or wire compatibility). This can be > > done case-by-case basis. > > > > Regards, > > jeagles > > > > On Thu, Jul 30, 2020 at 1:28 PM Eric Badger > > <ebad...@verizonmedia.com.invalid> wrote: > > > >> I have created > https://urldefense.com/v3/__https://issues.apache.org/jira/browse/HADOOP-17168__;!!Op6eflyXZCqGR5I!XjCu5VSFdt2uqyuzlkc53KSBa6IM-M2Wun_FX6uD8fl99OAvaj9wb-0kz4fK$ > to > >> remove > >> non-inclusive terminology from Hadoop. However I would like input on how > >> to > >> go about putting up patches. This umbrella JIRA is under Hadoop Common, > >> but > >> there are sure to be instances in YARN, HDFS, and Mapreduce. Should I > >> create an umbrella like this for each subproject? Or should I do all > >> whitelist/blacklist fixes in a single JIRA that fixes them across all > >> Hadoop subprojects? > >> > >> Thanks, > >> > >> Eric > >> > >> On Thu, Jul 30, 2020 at 8:47 AM Carlo Aldo Curino < > carlo.cur...@gmail.com > >> > > >> wrote: > >> > >> > RE Mentorship: I think the Mentorship program is an interesting idea. > >> The > >> > concerns with these efforts is always the follow-through. If you can > >> find a > >> > group of folks that are motivated and will work on this I think it > >> could be > >> > a great idea, especially if you focus on a diverse set of mentees, and > >> the > >> > focus in on teaching not just code but a bit of the "apache way" of > >> > interacting, and conducting yourself in open-source. > >> > > >> > RE Diversity and representation: Wei-Chiu I think you raise an > important > >> > problem. The main force behind this is typically for a company to be > >> deeply > >> > invested in a project and valuing OSS and putting lots full-time > >> > developers on it. Those will naturally become committers. On one side > >> this > >> > is good to the project, unless it becomes so unbalance that the OSS > >> nature > >> > of the effort is in question. Attracting more contributors across > >> > companies/countries (and any other dimension of diversity is > important) > >> > @Vinod I am sure you have been thinking about this issue, any > thoughts? > >> > > >> > Thanks, > >> > Carlo > >> > > >> > On Fri, Jul 10, 2020 at 1:49 PM Ahmed Hussein <a...@ahussein.me> wrote: > >> > > >> >> +1, this is great folks. > >> >> > >> >> In addition to that initiative, Do you think there is a chance to > >> launch > >> >> a "*Hadoop Mentorship Program for Minority Students*" > >> >> > >> >> *The program will work as follows:* > >> >> > >> >> - Define a programme committee to administrate and mentor > >> candidates. > >> >> - The Committee defines a timeline for applications and projects. > >> >> Let's say it is some sort of 3 months. (Similar to an internship) > >> >> - Define a list of ideas/projects that can be picked by the > >> candidates > >> >> - Candidates can propose their idea as well. This can be a good > way > >> >> to inject new blood and research ideas into Hadoop. > >> >> - Pick top top applications and assign them to mentors. > >> >> - If sponsors can allocate money, then candidates with good > >> >> evaluation can get some sort of prize. If no money is allocated, > >> then we > >> >> can discuss any other kind of motivation. > >> >> > >> >> I remember there were Student Mentorship programmes in Open source > >> >> projects like "JikesRVM" and several proposals were actually merged > >> and/or > >> >> transformed into publications. > >> >> There are many missing links that need to be filled like how to > define > >> >> the target and the audience of the programme > >> >> > >> >> Let me know WDYT guys. > >> >> > >> >> On Fri, Jul 10, 2020 at 1:45 PM Wei-Chiu Chuang <weic...@apache.org> > >> >> wrote: > >> >> > >> >>> Thanks Carlo and Eric for the initiative. > >> >>> > >> >>> I am all for it and I'll do my part to mind the code. This is a > small > >> yet > >> >>> meaningful step we can take. Meanwhile, I'd like to take this > >> opportunity > >> >>> to open up conversation around the Diversity & Inclusion within the > >> >>> community. > >> >>> > >> >>> If you read this quarter's Hadoop board report, I am starting to > >> collect > >> >>> metrics about the composition of our community in order to > understand > >> if > >> >>> we > >> >>> are building a diverse & inclusive community. Things that are > obvious > >> to > >> >>> me > >> >>> that I thought I should report are the following: affiliation among > >> >>> commiters, and demographics of committers. As of last quarter, 4 out > >> of 7 > >> >>> newly minted committers are affiliated with Cloudera. 4 out of the 7 > >> said > >> >>> committers are located in Asia. Those facts suggest we have a good > >> >>> international participation (I am being US-centric), which is good. > >> >>> However, having half of the active committers affiliated with one > >> company > >> >>> is a potential problem. > >> >>> > >> >>> I'd like to hear your thoughts on this. What other metrics should we > >> >>> collect, and what actions can we take. > >> >>> > >> >>> > >> >>> > >> >>> On Fri, Jul 10, 2020 at 11:29 AM Carlo Aldo Curino < > >> >>> carlo.cur...@gmail.com> > >> >>> wrote: > >> >>> > >> >>> > Eric, > >> >>> > > >> >>> > Thank you so much for the support and for stepping up offering to > >> work > >> >>> on > >> >>> > this. I am super +1 on this. Let's give folks a few more days to > >> chime > >> >>> in, > >> >>> > in case there is anything to discuss before we get cracking! > >> >>> > > >> >>> > (Really) Thanks, > >> >>> > Carlo > >> >>> > > >> >>> > On Fri, Jul 10, 2020, 10:38 AM Eric Badger < > >> ebad...@verizonmedia.com> > >> >>> > wrote: > >> >>> > > >> >>> > > Thanks for writing this up, Carlo. I'm +1 (idk if I'm > technically > >> >>> binding > >> >>> > > on this or not) for the changes moving forward and I think we > >> >>> refactor > >> >>> > away > >> >>> > > any instances that are internal to the code (i.e. not APIs or > >> other > >> >>> > things > >> >>> > > that would break compatibility) in all active branches and then > >> also > >> >>> > change > >> >>> > > the APIs in trunk (an incompatible change). > >> >>> > > > >> >>> > > I just came across an internal issue related to the NM > >> >>> > > whitelist/blacklist. I would be happy to go refactor the code > and > >> >>> look > >> >>> > for > >> >>> > > instances of these and replace them with allowlist/blocklist. > >> Doing a > >> >>> > quick > >> >>> > > "git grep" of trunk, I see 270 instances of "whitelist" and 1318 > >> >>> > instances > >> >>> > > of "blacklist". > >> >>> > > > >> >>> > > If there are no objections, I'll create a JIRA to clean this > >> specific > >> >>> > > stuff up. It would be wonderful if others could pick up a > >> different > >> >>> > portion > >> >>> > > (e.g. master/slave) so that we can spread the work out. > >> >>> > > > >> >>> > > Eric > >> >>> > > > >> >>> > > On Tue, Jul 7, 2020 at 6:27 PM Carlo Aldo Curino < > >> >>> carlo.cur...@gmail.com > >> >>> > > > >> >>> > > wrote: > >> >>> > > > >> >>> > >> Hello Folks, > >> >>> > >> > >> >>> > >> I hope you are all doing well... > >> >>> > >> > >> >>> > >> *The problem* > >> >>> > >> The recent protests made me realize that we are not just a > >> >>> bystanders of > >> >>> > >> the systematic racism that affect our society, but we are > active > >> >>> > >> participants of it. Being "non-racist" is not enough, I > strongly > >> >>> feel we > >> >>> > >> should be actively "anti-racist" in our day to day lives, and > >> >>> > continuously > >> >>> > >> check our biases. I assume most of you will agree with the > >> general > >> >>> > >> sentiment, but based on your exposure to the recent events and > US > >> >>> > >> culture/history might have more or less strong feelings about > >> your > >> >>> role > >> >>> > in > >> >>> > >> the problem and potential solution. > >> >>> > >> > >> >>> > >> *What can we do about it?* I think a simple action we can take > >> is to > >> >>> > work > >> >>> > >> on our code/comments/documentation/websites and remove racist > >> >>> > terminology. > >> >>> > >> Here is a IETF draft to fix up some of the most egregious > >> examples > >> >>> > >> (master/slave, whitelist/backlist) with proposed alternatives. > >> >>> > >> > >> >>> > >> > >> >>> > > >> >>> > >> > https://urldefense.com/v3/__https://tools.ietf.org/id/draft-knodel-terminology-00.html*rfc.section.1.1.1__;Iw!!Op6eflyXZCqGR5I!XjCu5VSFdt2uqyuzlkc53KSBa6IM-M2Wun_FX6uD8fl99OAvaj9wb5A12dpg$ > >> >>> < > >> > https://urldefense.com/v3/__https://tools.ietf.org/id/draft-knodel-terminology-00.html*rfc.section.1.1.1__;Iw!!Op6eflyXZCqGR5I!W9THsx9iZb2VObBrVY5_8ZRJKCws3YRAXARB-YTUElcUtxOBPWpiHWfGaWE7Lbogn7k$ > >> > > >> >>> > >> Also as we go about this effort, we should also consider other > >> >>> > >> "non-inclusive" terminology issues around gender (e.g., binary > >> >>> gendered > >> >>> > >> examples, "Alice" doing the wrong security thing > systematically), > >> >>> and > >> >>> > >> ableism (e.g., referring to misbehaving hardware as "lame" or > >> >>> "limping", > >> >>> > >> etc.). > >> >>> > >> The easiest action item is to avoid this going forward (ideally > >> >>> adding > >> >>> > it > >> >>> > >> to the checkstyles if possible), a more costly one is to start > >> going > >> >>> > back > >> >>> > >> and refactor away existing instances. > >> >>> > >> > >> >>> > >> I know this requires a bunch of work as refactorings might > break > >> dev > >> >>> > >> branches and non-committed patches, possibly scripts, etc. but > I > >> >>> think > >> >>> > >> this > >> >>> > >> is something important and relatively simple we can do. The > >> effect > >> >>> goes > >> >>> > >> well beyond some text in github, it signals what we believe in, > >> and > >> >>> > forces > >> >>> > >> hundreds of users and contributors to notice and think about > it. > >> Our > >> >>> > >> force-multiplier is huge and it matches our responsibility. > >> >>> > >> > >> >>> > >> What do you folks think? > >> >>> > >> > >> >>> > >> Thanks, > >> >>> > >> Carlo > >> >>> > >> > >> >>> > > > >> >>> > > >> >>> > >> >> > >> >> > >> >> -- > >> >> -- > >> >> Best Regards, > >> >> > >> >> *Ahmed Hussein, PhD* > >> >> > >> > > >> > > >