I'm very unhappy with this direction. In particular, I don't think git is a
good place for distribution of binary artifacts. Furthermore, the PMC
shouldn't be releasing anything without a release vote.

I'd propose that we make a third party module that contains the *source* of
the pom files to build the relocated jars. This should absolutely be
treated as a last resort for the mostly Google projects that regularly
break binary compatibility (eg. Protobuf & Guava).

In terms of naming, I'd propose something like:

org.apache.hadoop.thirdparty.protobuf2_5
org.apache.hadoop.thirdparty.guava28

In particular, I think we absolutely need to include the version of the
underlying project. On the other hand, since we should not be shading
*everything* we can drop the leading com.google.

The Hadoop project can make releases of  the thirdparty module:

<dependency>
  <groupId>org.apache.hadoop</groupId>
  <artifactId>hadoop-thirdparty-protobuf25</artifactId>
  <version>1.0</version>
</dependency>

Note that the version has to be the hadoop thirdparty release number, which
is part of why you need to have the underlying version in the artifact
name. These we can push to maven central as new releases from Hadoop.

Thoughts?

.. Owen

On Fri, Sep 27, 2019 at 8:38 AM Vinayakumar B <vinayakum...@apache.org>
wrote:

> Hi All,
>
>    I wanted to discuss about the separate repo for thirdparty dependencies
> which we need to shaded and include in Hadoop component's jars.
>
>    Apologies for the big text ahead, but this needs clear explanation!!
>
>    Right now most needed such dependency is protobuf. Protobuf dependency
> was not upgraded from 2.5.0 onwards with the fear that downstream builds,
> which depends on transitive dependency protobuf coming from hadoop's jars,
> may fail with the upgrade. Apparently protobuf does not guarantee source
> compatibility, though it guarantees wire compatibility between versions.
> Because of this behavior, version upgrade may cause breakage in known and
> unknown (private?) downstreams.
>
>    So to tackle this, we came up the following proposal in HADOOP-13363.
>
>    Luckily, As far as I know, no APIs, either public to user or between
> Hadoop processes, is not directly using protobuf classes in signatures. (If
> any exist, please let us know).
>
>    Proposal:
>    ------------
>
>    1. Create a artifact(s) which contains shaded dependencies. All such
> shading/relocation will be with known prefix
> **org.apache.hadoop.thirdparty.**.
>    2. Right now protobuf jar (ex: o.a.h.thirdparty:hadoop-shaded-protobuf)
> to start with, all **com.google.protobuf** classes will be relocated as
> **org.apache.hadoop.thirdparty.com.google.protobuf**.
>    3. Hadoop modules, which needs protobuf as dependency, will add this
> shaded artifact as dependency (ex:
> o.a.h.thirdparty:hadoop-shaded-protobuf).
>    4. All previous usages of "com.google.protobuf" will be relocated to
> "org.apache.hadoop.thirdparty.com.google.protobuf" in the code and will be
> committed. Please note, this replacement is One-Time directly in source
> code, NOT during compile and package.
>    5. Once all usages of "com.google.protobuf" is relocated, then hadoop
> dont care about which version of original  "protobuf-java" is in
> dependency.
>    6. Just keep "protobuf-java:2.5.0" in dependency tree not to break the
> downstreams. But hadoop will be originally using the latest protobuf
> present in "o.a.h.thirdparty:hadoop-shaded-protobuf".
>
>    7. Coming back to separate repo, Following are most appropriate reasons
> of keeping shaded dependency artifact in separate repo instead of
> submodule.
>
>       7a. These artifacts need not be built all the time. It needs to be
> built only when there is a change in the dependency version or the build
> process.
>       7b. If added as "submodule in Hadoop repo", maven-shade-plugin:shade
> will execute only in package phase. That means, "mvn compile" or "mvn
> test-compile" will not be failed as this artifact will not have relocated
> classes, instead it will have original classes, resulting in compilation
> failure. Workaround, build thirdparty submodule first and exclude
> "thirdparty" submodule in other executions. This will be a complex process
> compared to keeping in a separate repo.
>
>       7c. Separate repo, will be a subproject of Hadoop, using the same
> HADOOP jira project, with different versioning prefixed with "thirdparty-"
> (ex: thirdparty-1.0.0).
>       7d. Separate will have same release process as Hadoop.
>
>
>     HADOOP-13363 (https://issues.apache.org/jira/browse/HADOOP-13363) is
> an
> umbrella jira tracking the changes to protobuf upgrade.
>
>     PR (https://github.com/apache/hadoop-thirdparty/pull/1) has been
> raised
> for separate repo creation in (HADOOP-16595 (
> https://issues.apache.org/jira/browse/HADOOP-16595)
>
>     Please provide your inputs for the proposal and review the PR to
> proceed with the proposal.
>
>
>    -Thanks,
>     Vinay
>
> On Fri, Sep 27, 2019 at 11:54 AM Vinod Kumar Vavilapalli <
> vino...@apache.org>
> wrote:
>
> > Moving the thread to the dev lists.
> >
> > Thanks
> > +Vinod
> >
> > > On Sep 23, 2019, at 11:43 PM, Vinayakumar B <vinayakum...@apache.org>
> > wrote:
> > >
> > > Thanks Marton,
> > >
> > > Current created 'hadoop-thirdparty' repo is empty right now.
> > > Whether to use that repo  for shaded artifact or not will be monitored
> in
> > > HADOOP-13363 umbrella jira. Please feel free to join the discussion.
> > >
> > > There is no existing codebase is being moved out of hadoop repo. So I
> > think
> > > right now we are good to go.
> > >
> > > -Vinay
> > >
> > > On Mon, Sep 23, 2019 at 11:38 PM Marton Elek <e...@apache.org> wrote:
> > >
> > >>
> > >> I am not sure if it's defined when is a vote required.
> > >>
> > >> https://www.apache.org/foundation/voting.html
> > >>
> > >> Personally I think it's a big enough change to send a notification to
> > the
> > >> dev lists with a 'lazy consensus'  closure
> > >>
> > >> Marton
> > >>
> > >> On 2019/09/23 17:46:37, Vinayakumar B <vinayakum...@apache.org>
> wrote:
> > >>> Hi,
> > >>>
> > >>> As discussed in HADOOP-13363, protobuf 3.x jar (and may be more in
> > >> future)
> > >>> will be kept as a shaded artifact in a separate repo, which will be
> > >>> referred as dependency in hadoop modules.  This approach avoids
> shading
> > >> of
> > >>> every submodule during build.
> > >>>
> > >>> So question is does any VOTE required before asking to create a git
> > repo?
> > >>>
> > >>> On selfserve platform https://gitbox.apache.org/setup/newrepo.html
> > >>> I can access see that, requester should be PMC.
> > >>>
> > >>> Wanted to confirm here first.
> > >>>
> > >>> -Vinay
> > >>>
> > >>
> > >> ---------------------------------------------------------------------
> > >> To unsubscribe, e-mail: private-unsubscr...@hadoop.apache.org
> > >> For additional commands, e-mail: private-h...@hadoop.apache.org
> > >>
> > >>
> >
> >
>

Reply via email to