2010

Eli Collins Fri, 28 May 2010 20:00:11 -0700

Slides attached.  Thanks for taking notes Chris!


On Fri, May 28, 2010 at 5:37 PM, Chris Douglas <cdoug...@apache.org> wrote:
> This month, the MapReduce + HDFS contributor meeting was held at
> Cloudera Headquarters.
>
> Announcements for contributor meetings are here:
> http://www.meetup.com/Hadoop-Contributors/
>
> Minutes follow. No decisions were made at this meeting, but the
> following issues were discussed and may presage future discussion and
> decisions on these lists.
>
> Eli, I think you have all the slides. Would you mind sending them out? -C
>
> == 0.21 release update ==
> * Continuing to close blockers, ping people for updates and suggestions
> * About 20 open blockers. Many are MapReduce documentation that may be
> pushed. Speak up if 0.21 is missing anything substantive.
> * Common/HDFS visibility and annotations are close to consensus;
> MapReduce annotations are committed to trunk and the 0.21 branch
>
> == HEP proposal ==
> (what follows is the sketch presented at the meeting. A full proposal
> with concrete details will be circulated on the list)
>
> * Based on- and very similar to- the PEP (Python Enhancement Proposal) Process
> * Audience is HDFS and MapReduce; not necessarily adopted by other subprojects
>  - Addresses the perception that there is friction between
> innovation/experimentation and stability
> * Not for small enhancements, features, and bug fixes. This should not
> slow down typical development or impede casual contribution to Hadoop
> * Primary mechanism for new features, collecting input, documenting
> design decisions
> * JIRA is good for details, but not for deciding on wide shifts in direction
> * Purpose is for author to build consensus and gather dissenting opinions.
>  - All may comment, but Editors will review incoming HEP material
>  - Editors determine only whether the HEP is complete, not whether
> they believe it is a sound idea
>  - Editors are appointed by the PMC
>  - Mechanism for appointing Editors and term of service TBD
>    - Apache Board appoints Shepherds for projects somewhat randomly,
> to projects. A similar mechanism could work for incoming HEPs
>  - Proposal *may* come with code, but not necessarily.
> Drafting/baking of the HEP occurs in public on a list dedicated to
> that particular proposal. Once Editors certify the HEP as complete, it
> is sent to general@ for wider discussion.
>    - The discussion phase begins on gene...@. The mailing list exists
> to ensure the HEP is complete enough to present to the community.
>  - Some discussion on the difference between posting to general@ and
> posting to the HEP list. Completeness is, of course, subjective. If
> the Editor and Author disagree whether the proposal affects an aspect
> of the framework enough to merit special consideration, it is not
> entirely clear how to resolve the disagreement.
>    - In general, the role of the Editor in the community-driven
> process of Hadoop is not entirely clear. It may be possible to
> optimize it out.
>  - Once discussion ends, the HEP is passed (or fails to pass) by a
> vote of the PMC (mechanics undefined). In Python, the result is
> committed to the repository. A similar practice would make sense in
> Hadoop.
> * Which issues require HEPs?
>  - Discussion ranged. Append, backup namenode, edit log rewrite, et
> al. were examples of features substantial enough to merit a HEP. Pure
> Java CRC is an example of an enhancement that would not. Whether an
> explicit process must be in place to determine whether an issue
> requires a HEP is not clear.
>  - Viewing HEPs as a way of soliciting consensus for an approach
> might be more accurate. Going through the HEP process should always
> improve the chances of a successful proposal
>
> * Evaluation
>  - The proposal may be rejected if it is redundant with existing
> functionality, technically unsound, insufficiently motivated, no
> backwards compatibility story, etc.
>  - Implementation is not necessary, and is lightly discouraged.
> Feedback is less welcome once code is in hand.
>  - Purpose is to be clear about the acceptance criteria for that
> issue, e.g. concerns that the proposal may not scale or may harm
> performance
>  - Dissenting opinions must be recorded accurately. Quoting would be
> a safe practice for the Author to encourage HEP reviewers not to block
> the product of the proposal.
>
> * The testing burden and completion strategy may be ambiguous
>  - Whether the proposal affects scalability may not be testable by
> the implementer. Completing the proposal to address all use cases may
> require considerably more work than the Author is willing or motivated
> to invest.
>  - The HEP discussion on general@ should explore whether such
> objections are merited and reasonable. For example, a particularly
> obscure/esoteric use case could be included as a condition for
> acceptance if the dissenter is willing to invest the resources to
> test/validate it. The process is flexible in this regard.
>    - But it is not infinitely flexible. Backwards compatibility,
> performance regression, availability, and other considerations need
> not be called out in every HEP.
>    - Traditional concerns need to be documented. Acceptance criteria
> should ideally be automated and reproducible in different
> organizations
>
> == Branching ==
> * A patch and a branch are isomorphic from a policy perspective. Of
> course, they are functionally distinct: branches are easier to
> collaborate on and are, generally, longer-lived than are patches. But
> special policies need not be derived to account for these differences,
> which concern the production of the code, not its review and
> acceptance.
> * Some developers find branches to be easier to review than very large
> patches and easier to merge, given a toolchain that supports this.
>  - Subversion currently is difficult to adapt to this model
>  - Could be done on a HEP-by-HEP basis, as a condition for acceptance
> * Eclipse Labs
>  - Branded version of Google Code (same functionality, w/ Eclipse brand)
>  - Not official Eclipse projects, but associated with Eclipse
>  - Apache/Hadoop may consider a similar strategy
>  - Distinct from Apache Labs, as one need not be a committer, follow
> its rules for releases, etc.
>
> == Contrib ==
> * Modules (such as fuse-dfs) are not actively maintained in the main
> repository and would benefit from a release schedule decoupled from
> the rest of Hadoop
> * With few exceptions, the contrib modules have smaller, often
> discrete groups of maintainers. It may be worth exploring whether
> these projects could live elsewhere
>

Re: Contributor Meeting Minutes 05/28/2010

Reply via email to