Hi Eric and Carlo, Thanks for taking the initiative! I am willing to take this task up for improving the Ozone codebase.
I have cloned the task and sub-tasks for Ozone - https://issues.apache.org/jira/browse/HDDS-4050 - Vivek Subramanian On Thu, Jul 30, 2020 at 3:54 PM Eric Badger <ebad...@verizonmedia.com.invalid> wrote: > Thanks for the responses, Jon and Carlo! > > It makes sense to me to prevent future patches from re-introducing the > terminology. I can file a JIRA to add the +1/-1 functionality to the > precommit builds. > > As for splitting up the work, I think it'll probably be easiest and > cleanest to have an umbrella for each subproject of Hadoop (Hadoop, HDFS, > YARN, Mapreduce) with smaller tasks (e.g. whitelist/blacklist, > master/slave) as subtasks of each umbrella. That way each expert can chime > in on their relative land of expertise and the patches won't be gigantic. I > can then link the umbrella JIRAs together so everything can be found > easily. As Carlo pointed out, it's unclear whether fewer, but larger > patches is better or worse than more, smaller patches. But I think that at > least for the sake of manageability and getting this into Apache, smaller > patches is likely easier. > > Eric > > On Thu, Jul 30, 2020 at 5:50 PM Carlo Aldo Curino <carlo.cur...@gmail.com> > wrote: > > > Thanks again Eric for leading the charge. As for whether to chop it up or > > keep it in fewer patches, I think it primarily impact the conflict > surface > > with dev branches and other in-flight development. More patches are > likely > > creating more localized clashes (as in I clash with a smaller patch, > which > > might be less daunting, though potentially more of them to deal with). I > > don't have a strong preference, maybe chunking it into reasonable > packages, > > so that you can involve the right core group of committers to way in for > > each sub-area. > > > > Thanks, > > Carlo > > > > > > > > On Thu, Jul 30, 2020 at 1:20 PM Jonathan Eagles <jeag...@gmail.com> > wrote: > > > > > Thanks, Eric. I like this proposal and I'm glad this work is getting > > > traction. A few thoughts on implementation. > > > > > > Once the fix is done, I think it will be necessary to ensure these > > > language restrictions are enforced at the patch level. This will +1/-1 > > > patches that introduce terminology that violate our policy. > > > > > > As to splitting up the patches, it may be necessary to to split these > up > > > further in cases where feature experts need to weigh in on > compatibility > > > (usually with regards to persistence or wire compatibility). This can > be > > > done case-by-case basis. > > > > > > Regards, > > > jeagles > > > > > > On Thu, Jul 30, 2020 at 1:28 PM Eric Badger > > > <ebad...@verizonmedia.com.invalid> wrote: > > > > > >> I have created > > > https://urldefense.com/v3/__https://issues.apache.org/jira/browse/HADOOP-17168__;!!Op6eflyXZCqGR5I!XjCu5VSFdt2uqyuzlkc53KSBa6IM-M2Wun_FX6uD8fl99OAvaj9wb-0kz4fK$ > > to > > >> remove > > >> non-inclusive terminology from Hadoop. However I would like input on > how > > >> to > > >> go about putting up patches. This umbrella JIRA is under Hadoop > Common, > > >> but > > >> there are sure to be instances in YARN, HDFS, and Mapreduce. Should I > > >> create an umbrella like this for each subproject? Or should I do all > > >> whitelist/blacklist fixes in a single JIRA that fixes them across all > > >> Hadoop subprojects? > > >> > > >> Thanks, > > >> > > >> Eric > > >> > > >> On Thu, Jul 30, 2020 at 8:47 AM Carlo Aldo Curino < > > carlo.cur...@gmail.com > > >> > > > >> wrote: > > >> > > >> > RE Mentorship: I think the Mentorship program is an interesting > idea. > > >> The > > >> > concerns with these efforts is always the follow-through. If you can > > >> find a > > >> > group of folks that are motivated and will work on this I think it > > >> could be > > >> > a great idea, especially if you focus on a diverse set of mentees, > and > > >> the > > >> > focus in on teaching not just code but a bit of the "apache way" of > > >> > interacting, and conducting yourself in open-source. > > >> > > > >> > RE Diversity and representation: Wei-Chiu I think you raise an > > important > > >> > problem. The main force behind this is typically for a company to be > > >> deeply > > >> > invested in a project and valuing OSS and putting lots full-time > > >> > developers on it. Those will naturally become committers. On one > side > > >> this > > >> > is good to the project, unless it becomes so unbalance that the OSS > > >> nature > > >> > of the effort is in question. Attracting more contributors across > > >> > companies/countries (and any other dimension of diversity is > > important) > > >> > @Vinod I am sure you have been thinking about this issue, any > > thoughts? > > >> > > > >> > Thanks, > > >> > Carlo > > >> > > > >> > On Fri, Jul 10, 2020 at 1:49 PM Ahmed Hussein <a...@ahussein.me> > wrote: > > >> > > > >> >> +1, this is great folks. > > >> >> > > >> >> In addition to that initiative, Do you think there is a chance to > > >> launch > > >> >> a "*Hadoop Mentorship Program for Minority Students*" > > >> >> > > >> >> *The program will work as follows:* > > >> >> > > >> >> - Define a programme committee to administrate and mentor > > >> candidates. > > >> >> - The Committee defines a timeline for applications and > projects. > > >> >> Let's say it is some sort of 3 months. (Similar to an > internship) > > >> >> - Define a list of ideas/projects that can be picked by the > > >> candidates > > >> >> - Candidates can propose their idea as well. This can be a good > > way > > >> >> to inject new blood and research ideas into Hadoop. > > >> >> - Pick top top applications and assign them to mentors. > > >> >> - If sponsors can allocate money, then candidates with good > > >> >> evaluation can get some sort of prize. If no money is allocated, > > >> then we > > >> >> can discuss any other kind of motivation. > > >> >> > > >> >> I remember there were Student Mentorship programmes in Open source > > >> >> projects like "JikesRVM" and several proposals were actually merged > > >> and/or > > >> >> transformed into publications. > > >> >> There are many missing links that need to be filled like how to > > define > > >> >> the target and the audience of the programme > > >> >> > > >> >> Let me know WDYT guys. > > >> >> > > >> >> On Fri, Jul 10, 2020 at 1:45 PM Wei-Chiu Chuang < > weic...@apache.org> > > >> >> wrote: > > >> >> > > >> >>> Thanks Carlo and Eric for the initiative. > > >> >>> > > >> >>> I am all for it and I'll do my part to mind the code. This is a > > small > > >> yet > > >> >>> meaningful step we can take. Meanwhile, I'd like to take this > > >> opportunity > > >> >>> to open up conversation around the Diversity & Inclusion within > the > > >> >>> community. > > >> >>> > > >> >>> If you read this quarter's Hadoop board report, I am starting to > > >> collect > > >> >>> metrics about the composition of our community in order to > > understand > > >> if > > >> >>> we > > >> >>> are building a diverse & inclusive community. Things that are > > obvious > > >> to > > >> >>> me > > >> >>> that I thought I should report are the following: affiliation > among > > >> >>> commiters, and demographics of committers. As of last quarter, 4 > out > > >> of 7 > > >> >>> newly minted committers are affiliated with Cloudera. 4 out of > the 7 > > >> said > > >> >>> committers are located in Asia. Those facts suggest we have a good > > >> >>> international participation (I am being US-centric), which is > good. > > >> >>> However, having half of the active committers affiliated with one > > >> company > > >> >>> is a potential problem. > > >> >>> > > >> >>> I'd like to hear your thoughts on this. What other metrics should > we > > >> >>> collect, and what actions can we take. > > >> >>> > > >> >>> > > >> >>> > > >> >>> On Fri, Jul 10, 2020 at 11:29 AM Carlo Aldo Curino < > > >> >>> carlo.cur...@gmail.com> > > >> >>> wrote: > > >> >>> > > >> >>> > Eric, > > >> >>> > > > >> >>> > Thank you so much for the support and for stepping up offering > to > > >> work > > >> >>> on > > >> >>> > this. I am super +1 on this. Let's give folks a few more days to > > >> chime > > >> >>> in, > > >> >>> > in case there is anything to discuss before we get cracking! > > >> >>> > > > >> >>> > (Really) Thanks, > > >> >>> > Carlo > > >> >>> > > > >> >>> > On Fri, Jul 10, 2020, 10:38 AM Eric Badger < > > >> ebad...@verizonmedia.com> > > >> >>> > wrote: > > >> >>> > > > >> >>> > > Thanks for writing this up, Carlo. I'm +1 (idk if I'm > > technically > > >> >>> binding > > >> >>> > > on this or not) for the changes moving forward and I think we > > >> >>> refactor > > >> >>> > away > > >> >>> > > any instances that are internal to the code (i.e. not APIs or > > >> other > > >> >>> > things > > >> >>> > > that would break compatibility) in all active branches and > then > > >> also > > >> >>> > change > > >> >>> > > the APIs in trunk (an incompatible change). > > >> >>> > > > > >> >>> > > I just came across an internal issue related to the NM > > >> >>> > > whitelist/blacklist. I would be happy to go refactor the code > > and > > >> >>> look > > >> >>> > for > > >> >>> > > instances of these and replace them with allowlist/blocklist. > > >> Doing a > > >> >>> > quick > > >> >>> > > "git grep" of trunk, I see 270 instances of "whitelist" and > 1318 > > >> >>> > instances > > >> >>> > > of "blacklist". > > >> >>> > > > > >> >>> > > If there are no objections, I'll create a JIRA to clean this > > >> specific > > >> >>> > > stuff up. It would be wonderful if others could pick up a > > >> different > > >> >>> > portion > > >> >>> > > (e.g. master/slave) so that we can spread the work out. > > >> >>> > > > > >> >>> > > Eric > > >> >>> > > > > >> >>> > > On Tue, Jul 7, 2020 at 6:27 PM Carlo Aldo Curino < > > >> >>> carlo.cur...@gmail.com > > >> >>> > > > > >> >>> > > wrote: > > >> >>> > > > > >> >>> > >> Hello Folks, > > >> >>> > >> > > >> >>> > >> I hope you are all doing well... > > >> >>> > >> > > >> >>> > >> *The problem* > > >> >>> > >> The recent protests made me realize that we are not just a > > >> >>> bystanders of > > >> >>> > >> the systematic racism that affect our society, but we are > > active > > >> >>> > >> participants of it. Being "non-racist" is not enough, I > > strongly > > >> >>> feel we > > >> >>> > >> should be actively "anti-racist" in our day to day lives, and > > >> >>> > continuously > > >> >>> > >> check our biases. I assume most of you will agree with the > > >> general > > >> >>> > >> sentiment, but based on your exposure to the recent events > and > > US > > >> >>> > >> culture/history might have more or less strong feelings about > > >> your > > >> >>> role > > >> >>> > in > > >> >>> > >> the problem and potential solution. > > >> >>> > >> > > >> >>> > >> *What can we do about it?* I think a simple action we can > take > > >> is to > > >> >>> > work > > >> >>> > >> on our code/comments/documentation/websites and remove racist > > >> >>> > terminology. > > >> >>> > >> Here is a IETF draft to fix up some of the most egregious > > >> examples > > >> >>> > >> (master/slave, whitelist/backlist) with proposed > alternatives. > > >> >>> > >> > > >> >>> > >> > > >> >>> > > > >> >>> > > >> > > > https://urldefense.com/v3/__https://tools.ietf.org/id/draft-knodel-terminology-00.html*rfc.section.1.1.1__;Iw!!Op6eflyXZCqGR5I!XjCu5VSFdt2uqyuzlkc53KSBa6IM-M2Wun_FX6uD8fl99OAvaj9wb5A12dpg$ > > >> >>> < > > >> > > > https://urldefense.com/v3/__https://tools.ietf.org/id/draft-knodel-terminology-00.html*rfc.section.1.1.1__;Iw!!Op6eflyXZCqGR5I!W9THsx9iZb2VObBrVY5_8ZRJKCws3YRAXARB-YTUElcUtxOBPWpiHWfGaWE7Lbogn7k$ > > >> > > > >> >>> > >> Also as we go about this effort, we should also consider > other > > >> >>> > >> "non-inclusive" terminology issues around gender (e.g., > binary > > >> >>> gendered > > >> >>> > >> examples, "Alice" doing the wrong security thing > > systematically), > > >> >>> and > > >> >>> > >> ableism (e.g., referring to misbehaving hardware as "lame" or > > >> >>> "limping", > > >> >>> > >> etc.). > > >> >>> > >> The easiest action item is to avoid this going forward > (ideally > > >> >>> adding > > >> >>> > it > > >> >>> > >> to the checkstyles if possible), a more costly one is to > start > > >> going > > >> >>> > back > > >> >>> > >> and refactor away existing instances. > > >> >>> > >> > > >> >>> > >> I know this requires a bunch of work as refactorings might > > break > > >> dev > > >> >>> > >> branches and non-committed patches, possibly scripts, etc. > but > > I > > >> >>> think > > >> >>> > >> this > > >> >>> > >> is something important and relatively simple we can do. The > > >> effect > > >> >>> goes > > >> >>> > >> well beyond some text in github, it signals what we believe > in, > > >> and > > >> >>> > forces > > >> >>> > >> hundreds of users and contributors to notice and think about > > it. > > >> Our > > >> >>> > >> force-multiplier is huge and it matches our responsibility. > > >> >>> > >> > > >> >>> > >> What do you folks think? > > >> >>> > >> > > >> >>> > >> Thanks, > > >> >>> > >> Carlo > > >> >>> > >> > > >> >>> > > > > >> >>> > > > >> >>> > > >> >> > > >> >> > > >> >> -- > > >> >> -- > > >> >> Best Regards, > > >> >> > > >> >> *Ahmed Hussein, PhD* > > >> >> > > >> > > > >> > > > > > >