Slides attached. Thanks for taking notes Chris!
On Fri, May 28, 2010 at 5:37 PM, Chris Douglas <cdoug...@apache.org> wrote: > This month, the MapReduce + HDFS contributor meeting was held at > Cloudera Headquarters. > > Announcements for contributor meetings are here: > http://www.meetup.com/Hadoop-Contributors/ > > Minutes follow. No decisions were made at this meeting, but the > following issues were discussed and may presage future discussion and > decisions on these lists. > > Eli, I think you have all the slides. Would you mind sending them out? -C > > == 0.21 release update == > * Continuing to close blockers, ping people for updates and suggestions > * About 20 open blockers. Many are MapReduce documentation that may be > pushed. Speak up if 0.21 is missing anything substantive. > * Common/HDFS visibility and annotations are close to consensus; > MapReduce annotations are committed to trunk and the 0.21 branch > > == HEP proposal == > (what follows is the sketch presented at the meeting. A full proposal > with concrete details will be circulated on the list) > > * Based on- and very similar to- the PEP (Python Enhancement Proposal) Process > * Audience is HDFS and MapReduce; not necessarily adopted by other subprojects > - Addresses the perception that there is friction between > innovation/experimentation and stability > * Not for small enhancements, features, and bug fixes. This should not > slow down typical development or impede casual contribution to Hadoop > * Primary mechanism for new features, collecting input, documenting > design decisions > * JIRA is good for details, but not for deciding on wide shifts in direction > * Purpose is for author to build consensus and gather dissenting opinions. > - All may comment, but Editors will review incoming HEP material > - Editors determine only whether the HEP is complete, not whether > they believe it is a sound idea > - Editors are appointed by the PMC > - Mechanism for appointing Editors and term of service TBD > - Apache Board appoints Shepherds for projects somewhat randomly, > to projects. A similar mechanism could work for incoming HEPs > - Proposal *may* come with code, but not necessarily. > Drafting/baking of the HEP occurs in public on a list dedicated to > that particular proposal. Once Editors certify the HEP as complete, it > is sent to general@ for wider discussion. > - The discussion phase begins on gene...@. The mailing list exists > to ensure the HEP is complete enough to present to the community. > - Some discussion on the difference between posting to general@ and > posting to the HEP list. Completeness is, of course, subjective. If > the Editor and Author disagree whether the proposal affects an aspect > of the framework enough to merit special consideration, it is not > entirely clear how to resolve the disagreement. > - In general, the role of the Editor in the community-driven > process of Hadoop is not entirely clear. It may be possible to > optimize it out. > - Once discussion ends, the HEP is passed (or fails to pass) by a > vote of the PMC (mechanics undefined). In Python, the result is > committed to the repository. A similar practice would make sense in > Hadoop. > * Which issues require HEPs? > - Discussion ranged. Append, backup namenode, edit log rewrite, et > al. were examples of features substantial enough to merit a HEP. Pure > Java CRC is an example of an enhancement that would not. Whether an > explicit process must be in place to determine whether an issue > requires a HEP is not clear. > - Viewing HEPs as a way of soliciting consensus for an approach > might be more accurate. Going through the HEP process should always > improve the chances of a successful proposal > > * Evaluation > - The proposal may be rejected if it is redundant with existing > functionality, technically unsound, insufficiently motivated, no > backwards compatibility story, etc. > - Implementation is not necessary, and is lightly discouraged. > Feedback is less welcome once code is in hand. > - Purpose is to be clear about the acceptance criteria for that > issue, e.g. concerns that the proposal may not scale or may harm > performance > - Dissenting opinions must be recorded accurately. Quoting would be > a safe practice for the Author to encourage HEP reviewers not to block > the product of the proposal. > > * The testing burden and completion strategy may be ambiguous > - Whether the proposal affects scalability may not be testable by > the implementer. Completing the proposal to address all use cases may > require considerably more work than the Author is willing or motivated > to invest. > - The HEP discussion on general@ should explore whether such > objections are merited and reasonable. For example, a particularly > obscure/esoteric use case could be included as a condition for > acceptance if the dissenter is willing to invest the resources to > test/validate it. The process is flexible in this regard. > - But it is not infinitely flexible. Backwards compatibility, > performance regression, availability, and other considerations need > not be called out in every HEP. > - Traditional concerns need to be documented. Acceptance criteria > should ideally be automated and reproducible in different > organizations > > == Branching == > * A patch and a branch are isomorphic from a policy perspective. Of > course, they are functionally distinct: branches are easier to > collaborate on and are, generally, longer-lived than are patches. But > special policies need not be derived to account for these differences, > which concern the production of the code, not its review and > acceptance. > * Some developers find branches to be easier to review than very large > patches and easier to merge, given a toolchain that supports this. > - Subversion currently is difficult to adapt to this model > - Could be done on a HEP-by-HEP basis, as a condition for acceptance > * Eclipse Labs > - Branded version of Google Code (same functionality, w/ Eclipse brand) > - Not official Eclipse projects, but associated with Eclipse > - Apache/Hadoop may consider a similar strategy > - Distinct from Apache Labs, as one need not be a committer, follow > its rules for releases, etc. > > == Contrib == > * Modules (such as fuse-dfs) are not actively maintained in the main > repository and would benefit from a release schedule decoupled from > the rest of Hadoop > * With few exceptions, the contrib modules have smaller, often > discrete groups of maintainers. It may be worth exploring whether > these projects could live elsewhere >