Thanks Wei-Chiu for sharing this great list! I'll try to help on the reviewing as well.
I think it may be worthwhile to also maintain a list like this in the long term. Some kind of dashboard showing what are the patches pending to be reviewed, what patches are stale, etc, in different categories. Spark community maintains a dashboard https://spark-prs.appspot.com/ which seems very useful. Chao On Sun, Jun 23, 2019 at 8:24 PM Wei-Chiu Chuang <weic...@cloudera.com.invalid> wrote: > Thank you Xiaoqiao. > > I have been pondering about fostering a better community, one that > advocates more collaboration. It is my intend to make that happen. > > Clearly, a lot of great success had happened in this community, and this is > a highly professional community. > > But we set the bar so high, that I feel like it is not very friendly to > newbies. And clearly, I see a lot of folks eager to contribute to this > project. > How can we work together to make this a more newbie-friendly community, is > my question. > > I think one observation is that there are only a limited number of active > committers in the HDFS project (take this year for example, only 5 > committers have made more than 10 HDFS commits). A limited review bandwidth > means some patches are left unreviewed. We typically nominate a new > committer when he/she contribute make a certain sizable amount of > contribution. Without sufficient review bandwidth, it gets harder for a > contributor to progress into a committer. Eventually, HDFS project goes > into a slow death when we are unable to nominate new committers faster than > the speed for committers to go inactive. > > The jira isn't a reviewer friendly place either. I see a lot of great ideas > left un-reviewed, or unresolved, because it is hard to track what I am > reviewing or what I plan to do with a jira. We keep a similar spreadsheet > at Cloudera for patches made available by Clouderans. But there's no reason > why we can't do this across all contributors, as long as people find it > useful. > > On Sat, Jun 22, 2019 at 3:17 AM Xiaoqiao He <xq.he2...@gmail.com> wrote: > > > Thanks Wei-Chiu for your great work. > > > > All JIRAs listed is very valuable and I would like to try my best to > > participate to review and give some feedback. > > Another side, I think there are also some helpful JIRAs but not digged > up. > > Does the spreadsheet support to insert more candidate JIRAs about > > performance? (to Wei-Chiu) > > > Please feel free to enhance the spreadsheet. > > > > > Some other discussion, > > a. I suggest that we should go through all JIRAs regularly and report > some > > performance improvement JIRAs, Of course it really takes up lots of time, > > and I believe many guys/contributors would like to participate in. > > Meanwhile it may be good topic for community sync up (cc @Wangda). > > Sounds like a great idea. It would also be a great opportunity to talk > about a bigger initiative. Like I see a few folks from Xiaomi making really > good work there, and I'd be interested to learn more. > > > > b. Beyond that, I think we should also scan some BUG JIRAs (for instance > > HDFS-12862) reported but not fixed up to now. > > Thanks Wei-Chiu again. > > > > Best Regards, > > Hexiaoqiao > > > > > > On Sat, Jun 22, 2019 at 11:47 AM Wei-Chiu Chuang <weic...@apache.org> > > wrote: > > > > > I spent the past week going through most of the jiras with a patch > > attached > > > in the past, and turned up some really good stuff to helps improve HDFS > > > performance. > > > > > > The list of jiras are listed in the following spreadsheet. If you are > > > interested in reviewing those jiras, please update the following > > > spreadsheet and add you as a reviewer. A reviewer does not need to be a > > > Hadoop committer, but it helps to give the author the feedback. > > > > > > > > > > > > https://docs.google.com/spreadsheets/d/1dvLoZ039ZirdZF9p0wWKhFCtD91jfbdkPg4XZ-AnMNg/edit?usp=sharing > > > > > > I am doing this exercise to identify known performance limitations + > > fixes > > > submitted but never got committed. There are cases where patch was > > reviewed > > > or even blessed with +1, but didn't pushed to the repo; there are cases > > > where good ideas never got reviewed. > > > > > > I think this is the low hanging fruit that we as a community should do. > > > > > > I use this filter to search for Hadoop/HDFS patches, if you are > > interested: > > > > > > > > > https://issues.apache.org/jira/issues/?filter=12311124&jql=project%20in%20(HADOOP%2C%20HDFS)%20AND%20status%20%3D%20%22Patch%20Available%22%20ORDER%20BY%20updated%20DESC%2C%20key%20DESC > > > > > > Best, > > > Wei-Chiu > > > > > >