Question: How would a user end up with the same shaded dependency twice? The shaded dependencies are transitive dependencies of Beam and thus, this shouldn't happen. Is this a safe-guard when running different versions of Beam in the same JVM?

Other than the search/replace issue, the full version string is probably more elegant and less error-prone, so I'm fine with that.

For the groupid/artifact name I prefer:

groupid: org.apache.beam.vendored
artifact: guava-20_0

On 24.10.18 21:53, Lukasz Cwik wrote:
On Wed, Oct 24, 2018 at 11:31 AM Kenneth Knowles <[email protected] <mailto:[email protected]>> wrote:

    OK. I just opened https://github.com/apache/beam/pull/6809 to push
    Guava through. I made some comments there, and also I agree with
    Luke that full version string makes sense. For this purpose it seems
    easy and fine to do a search/replace to swap 20.0 for 20.1, and
    compatibility between them should not be a concern.

    I have minor suggestions and clarifications:

      - Is there value to `beam` in the artifactId? I would leave it off
    unless there's a special need

It would only provide consistency with all our other artifactIds that we publish but there isn't a special need that I'm aware of.

      - Users should never use these and we make it extremely clear they
    are not supported for any reasons

      - Use 0.x versions indicating no intention of semantic versioning

I like this idea a lot.


    Bringing my comments and Luke's together, here's the proposal:

         groupId: org.apache.beam
         artifactId: vendored-guava-20_0
         namespace: org.apache.beam.vendored.guava.v20_0
         version: 0.1

    Alternatively it could be

         groupId: org.apache.beam-vendored
         artifactid: guava-20_0
         namespace: org.apache.beam.vendored.guava.v20_0
         version: 0.1

    I like the latter but I haven't gone through the process of
    establishing a new groupId.

Based on https://maven.apache.org/guides/mini/guide-naming-conventions.html, the alternative groupId should be org.apache.beam.vendored and not org.apache.beam-vendored I slightly prefer org.apache.beam over org.apache.beam.vendored but not enough to object to either choice as long as we maintain consistency for all vendored dependencies we produce going forward.

    And for now we do not publish source jars. A couple of TODOs to get
    the build in good shape (classifiers, jars, interaction with plugins)

    Kenn


    On Wed, Oct 24, 2018 at 10:13 AM Lukasz Cwik <[email protected]
    <mailto:[email protected]>> wrote:

        It looks like we are agreeing to make each vendored dependency
        self contained and have all their own internal dependencies
        packaged. For example, gRPC and all its transitive dependencies
        would use org.apache.beam.vendored.grpc.vYYY and Calcite and all
        its transitive dependencies would use
        org.apache.beam.vendored.calcite.vZZZ.

        I also wanted to circle back on this question I had earlier that
        didn't have any follow-up:
        Currently we are relocating code depending on the version
        string. If the major version is >= 1, we use only the major
        version within the package string and rely on semantic
        versioning provided by the dependency to not break people. If
        the major version is 0, we assume the dependency is unstable and
        use the full version as part of the package string during
        relocation.

        The downside of using the full version string for relocated
        packages:
        1) Users will end up with multiple copies of dependencies that
        differ only by the minor or patch version increasing the size of
        their application.
        2) Bumping up the version of a dependency now requires the
        import statement in all java files to be updated (not too
        difficult with some sed/grep skills)

        The upside of using the full version string in the relocated
        package:
        1) We don't have to worry about whether a dependency maintains
        semantic versioning which means our users won't have to worry
        about that either.
        2) This increases the odds that a user will load multiple
        slightly different versions of the same dependency which is
        known to be incompatible in certain situations (e.g. Netty
        4.1.25 can't be on the classpath with Netty 4.1.28 even though
        they are both shaded due to issues of how JNI with tcnative works).

        My preference would be to use the full version string for import
        statements (so org.apache.beam.vendor.grpc.v1_13_1...) since
        this would allow multiple copies to not conflict with each other
        since in my opinion it is a lot more difficult to help a user
        debug a dependency issue then to use string replacement during
        dependency upgrades to fix import statements. Also I would
        suggest we name the artifacts in Maven as follows:
        groupId: org.apache.beam
        artifactId: beam-vendor-grpc-v1_13_1
        version: 1.0.0 (first version and subsequent versions such as
        1.0.1 are only for patch upgrades that fix any shading issues we
        may have had when producing the vendored jar)


        On Wed, Oct 24, 2018 at 6:01 AM Maximilian Michels
        <[email protected] <mailto:[email protected]>> wrote:

            Would also keep it simple and optimize for the JAR size only
            if necessary.

            On 24.10.18 00:06, Kenneth Knowles wrote:
             > I think it makes sense for each vendored dependency to be
            self-contained
             > as much as possible. It should keep it fairly simple.
            Things that cross
             > their API surface cannot be hidden, of course. Jar size
            is not a concern
             > IMO.
             >
             > Kenn
             >
             > On Tue, Oct 23, 2018 at 9:05 AM Lukasz Cwik
            <[email protected] <mailto:[email protected]>
             > <mailto:[email protected] <mailto:[email protected]>>> wrote:
             >
             >     How should we handle the transitive dependencies of
            the things we
             >     want to vendor?
             >
             >     For example we use gRPC which depends on Guava 20 and
            we also use
             >     Calcite which depends on Guava 19.
             >
             >     Should the vendored gRPC/Calcite/... be
            self-contained so it
             >     contains all its dependencies, hence vendored gRPC
            would contain
             >     Guava 20 and vendored Calcite would contain Guava 19
            (both under
             >     different namespaces)?
             >     This leads to larger jars but less vendored
            dependencies to maintain.
             >
             >     Or should we produce a vendored library for those
            that we want to
             >     share, e.g. Guava 20 that could be reused across
            multiple vendored
             >     libraries?
             >     Makes the vendoring process slightly more
            complicated, more
             >     dependencies to maintain, smaller jars.
             >
             >     Or should we produce a vendored library for each
            dependency?
             >     Lots of vendoring needed, likely tooling required to
            be built to
             >     maintain this.
             >
             >
             >
             >
             >     On Tue, Oct 23, 2018 at 8:46 AM Kenneth Knowles
            <[email protected] <mailto:[email protected]>
             >     <mailto:[email protected] <mailto:[email protected]>>> wrote:
             >
             >         I actually created the subtasks by finding things
            shaded by at
             >         least one module. I think each one should
            definitely have an on
             >         list discussion that clarifies the target
            artifact, namespace,
             >         version, possible complications, etc.
             >
             >         My impression is that many many modules shade
            only Guava. So for
             >         build time and simplification that is a big win.
             >
             >         Kenn
             >
             >         On Tue, Oct 23, 2018, 08:16 Thomas Weise
            <[email protected] <mailto:[email protected]>
             >         <mailto:[email protected] <mailto:[email protected]>>>
            wrote:
             >
             >             +1 for separate artifacts
             >
             >             I would request that we explicitly discuss
            and agree which
             >             dependencies we vendor though.
             >
             >             Not everything listed in the JIRA subtasks is
            currently
             >             relocated.
             >
             >             Thomas
             >
             >
             >             On Tue, Oct 23, 2018 at 8:04 AM David Morávek
             >             <[email protected]
            <mailto:[email protected]>
            <mailto:[email protected]
            <mailto:[email protected]>>>
             >             wrote:
             >
             >                 +1 This should improve build times a lot.
            It would be
             >                 great if vendored deps could stay in the
            main repository.
             >
             >                 D.
             >
             >                 On Tue, Oct 23, 2018 at 12:21 PM
            Maximilian Michels
             >                 <[email protected] <mailto:[email protected]>
            <mailto:[email protected] <mailto:[email protected]>>> wrote:
             >
             >                     Looks great, Kenn!
             >
             >                      > Max: what is the story behind
            having a separate
             >                     flink-shaded repo? Did that make it
            easier to manage
             >                     in some way?
             >
             >                     Better separation of concerns, but I
            don't think
             >                     releasing the shaded
             >                     artifacts from the main repo is a
            problem. I'd even
             >                     prefer not to split
             >                     up the repo because it makes updates
            to the vendored
             >                     dependencies
             >                     slightly easier.
             >
             >                     On 23.10.18 03:25, Kenneth Knowles wrote:
             >                      > OK, I've filed
             > https://issues.apache.org/jira/browse/BEAM-5819 to
             >                      > collect sub-tasks. This has enough
            upsides
             >                     throughout lots of areas of
             >                      > the project that even though it is
            not glamorous
             >                     it seems pretty
             >                      > valuable to start on immediately.
            And I want to
             >                     find out if there's a
             >                      > pitfall lurking.
             >                      >
             >                      > Max: what is the story behind
            having a separate
             >                     flink-shaded repo? Did
             >                      > that make it easier to manage in
            some way?
             >                      >
             >                      > Kenn
             >                      >
             >                      > On Mon, Oct 22, 2018 at 2:55 AM
            Maximilian
             >                     Michels <[email protected]
            <mailto:[email protected]> <mailto:[email protected]
            <mailto:[email protected]>>
             >                      > <mailto:[email protected]
            <mailto:[email protected]> <mailto:[email protected]
            <mailto:[email protected]>>>>
             >                     wrote:
             >                      >
             >                      >     +1 for publishing vendored Jars
             >                     independently. It will improve build
             >                      >     time and ease IntelliJ
            integration.
             >                      >
             >                      >     Flink also publishes shaded
            dependencies
             >                     separately:
             >                      >
             >                      >     -
            https://github.com/apache/flink-shaded
             >                      >     -
             > https://issues.apache.org/jira/browse/FLINK-6529
             >                      >
             >                      >     AFAIK their main motivation
            was to get rid of
             >                     duplicate shaded classes
             >                      >     on the classpath. We don't
            appear to have
             >                     that problem because we
             >                      >     already have a separate
            "vendor" project.
             >                      >
             >                      >      >  - With shading, it is hard
            (impossible?)
             >                     to step into dependency
             >                      >     code in IntelliJ's debugger,
            because the
             >                     actual symbol at runtime
             >                      >     does not match what is in the
            external jars
             >                      >
             >                      >     This would be solved by
            releasing the sources
             >                     of the shaded jars.
             >                      >      From a
             >                      >     legal perspective, this could
            be problematic
             >                     as alluded to here:
             >                      >
            https://github.com/apache/flink-shaded/issues/25
             >                      >
             >                      >     -Max
             >                      >
             >                      >     On 20.10.18 01:11, Lukasz Cwik
            wrote:
             >                      >      > I have tried several times
            to improve the
             >                     build system and intellij
             >                      >      > integration and each
            attempt ended with
             >                     little progress when dealing
             >                      >      > with vendored code. My
            latest attempt has
             >                     been the most promising
             >                      >     where
             >                      >      > I take the vendored
            classes/jars and
             >                     decompile them generating the
             >                      >      > source that Intellij can
            then use. I have
             >                     a branch[1] that
             >                      >     demonstrates
             >                      >      > the idea. It works pretty
            well (and up
             >                     until a change where we
             >                      >     started
             >                      >      > vendoring gRPC, was
            impractical to do.
             >                     Instructions to try it out
             >                      >     are:
             >                      >      >
             >                      >      > // Clean up any remnants of
            prior
             >                     builds/intellij projects
             >                      >      > git clean -fdx
             >                      >      > // Generated the source for
             >                     vendored/shaded modules
             >                      >      > ./gradlew decompile
             >                      >      >
             >                      >      > // Remove the "generated"
            Java sources for
             >                     protos so they don't
             >                      >     conflict with the decompiled
            sources.
             >                      >      > rm -rf
>  model/pipeline/build/generated/source/proto
             >                      >      > rm -rf
>  model/job-management/build/generated/source/proto
             >                      >      > rm -rf
>  model/fn-execution/build/generated/source/proto
             >                      >      > // Import the project into
            Intellij, most
             >                     code completion now
             >                      >     works still some issues with a
            few classes.
             >                      >      > // Note that the Java
            decompiler doesn't
             >                     generate valid source so
             >                      >     still need to delegate to
            Gradle for
             >                     build/run/test actions
             >                      >      > // Other decompilers may do
            a better/worse
             >                     job but haven't tried
             >                      >     them.
             >                      >      >
             >                      >      >
             >                      >      > The problems that I face
            are that the
             >                     generated Java source from the
             >                      >      > protos and the decompiled
            source from the
             >                     compiled version of that
             >                      >      > source post shading are
            both being
             >                     imported as content roots and
             >                      >     then
             >                      >      > conflict. Also, the CFR
            decompiler isn't
             >                     producing valid source, if
             >                      >      > people could try others and
            report their
             >                     mileage, we may find one
             >                      >     that
             >                      >      > works and then we would be
            able to use
             >                     intellij to build/run our
             >                      >     code
             >                      >      > and not need to delegate
            all our
             >                     build/run/test actions to Gradle.
             >                      >      >
             >                      >      > After all these attempts I
            have done,
             >                     vendoring the dependencies
             >                      >     outside
             >                      >      > of the project seems like a
            sane approach
             >                     and unless someone
             >                      >     wants to
             >                      >      > take a stab at the best
            progress I have
             >                     made above, I would go
             >                      >     with what
             >                      >      > Kenn is suggesting even
            though it will
             >                     mean that we will need to
             >                      >     perform
             >                      >      > releases every time we want
            to change the
             >                     version of one of our
             >                      >     vendored
             >                      >      > dependencies.
             >                      >      >
             >                      >      > 1:
             > https://github.com/lukecwik/incubator-beam/tree/intellij
             >                      >      >
             >                      >      >
             >                      >      > On Fri, Oct 19, 2018 at
            10:43 AM Kenneth
             >                     Knowles <[email protected]
            <mailto:[email protected]> <mailto:[email protected]
            <mailto:[email protected]>>
             >                      >     <mailto:[email protected]
            <mailto:[email protected]> <mailto:[email protected]
            <mailto:[email protected]>>>
             >                      >      > <mailto:[email protected]
            <mailto:[email protected]>
             >                     <mailto:[email protected]
            <mailto:[email protected]>> <mailto:[email protected]
            <mailto:[email protected]>
             >                     <mailto:[email protected]
            <mailto:[email protected]>>>>> wrote:
             >                      >      >
             >                      >      >     Another reason to push
            on this is to
             >                     get build times down.
             >                      >     Once only
             >                      >      >     generated proto classes
            use the shadow
             >                     plugin we'll cut the build
             >                      >      >     time in ~half? And
            there is no reason
             >                     to constantly re-vendor.
             >                      >      >
             >                      >      >     Kenn
             >                      >      >
             >                      >      >     On Fri, Oct 19, 2018 at
            10:39 AM
             >                     Kenneth Knowles
             >                      >     <[email protected]
            <mailto:[email protected]> <mailto:[email protected]
            <mailto:[email protected]>>
             >                     <mailto:[email protected]
            <mailto:[email protected]> <mailto:[email protected]
            <mailto:[email protected]>>>
             >                      >      >     <mailto:[email protected]
            <mailto:[email protected]>
             >                     <mailto:[email protected]
            <mailto:[email protected]>> <mailto:[email protected]
            <mailto:[email protected]>
             >                     <mailto:[email protected]
            <mailto:[email protected]>>>>> wrote:
             >                      >      >
             >                      >      >         Hi all,
             >                      >      >
             >                      >      >         A while ago we had
            pretty good
             >                     consensus that we should
             >                      >     vendor
             >                      >      >         ("pre-shade") specific
             >                     dependencies, and there's start on it
             >                      >      >         here:
             > https://github.com/apache/beam/tree/master/vendor
             >                      >      >
             >                      >      >         IntelliJ notes:
             >                      >      >
             >                      >      >           - With shading,
            it is hard
             >                     (impossible?) to step into
             >                      >      >         dependency code in
            IntelliJ's
             >                     debugger, because the actual
             >                      >      >         symbol at runtime
            does not match
             >                     what is in the external jars
             >                      >      >
             >                      >      >
             >                      >      > Intellij can step through
            the classes if
             >                     they were published
             >                      >     outside the
             >                      >      > project since it can
            decompile them. The
             >                     source won't be legible.
             >                      >      > Decompiling the source as
            in my example
             >                     effectively shows the
             >                      >     same issue.
             >                      >      >
             >                      >      >           - With vendoring,
            if the
             >                     vendored dependencies are part
             >                      >     of the
             >                      >      >         project then
            IntelliJ gets
             >                     confused because it operates on
             >                      >      >         source, not the
            produced jars
             >                      >      >
             >                      >      >
             >                      >      > Yes, I tried several ways
            to get intellij
             >                     to ignore the source
             >                      >     and use
             >                      >      > the output jars but no luck.
             >                      >      >
             >                      >      >         The second one has
            a quick fix for
             >                     most cases*: don't
             >                      >     make the
             >                      >      >         vendored dep a
            subproject, but
             >                     just separately build and
             >                      >     publish
             >                      >      >         it. Since a
            vendored dependency
             >                     should change much more
             >                      >      >         infrequently (or if
            we bake the
             >                     version into the name,
             >                      >     ~never)
             >                      >      >         this means we
            publish once and
             >                     save headache and build
             >                      >     time forever.
             >                      >      >
             >                      >      >         WDYT? Have I
            overlooked something?
             >                     How about we set up
             >                      >     vendored
             >                      >      >         versions of guava,
            protobuf, grpc,
             >                     and publish them. We don't
             >                      >      >         have to actually
            start using them
             >                     yet, and can do it
             >                      >     incrementally.
             >                      >      >
             >                      >      >
             >                      >      > Currently we are relocating
            code depending
             >                     on the version string.
             >                      >     If the
             >                      >      > major version is >= 1, we
            use only the
             >                     major version within the
             >                      >     package
             >                      >      > string and rely on semantic
            versioning
             >                     provided by the dependency
             >                      >     to not
             >                      >      > break people. If the major
            version is 0,
             >                     we assume the dependency is
             >                      >      > unstable and use the full
            version as part
             >                     of the package string
             >                      >     during
             >                      >      > relocation.
             >                      >      >
             >                      >      > The downside of using the
            full version
             >                     string for relocated packages:
             >                      >      > 1) Users will end up with
            multiple copies
             >                     of dependencies that
             >                      >     differ
             >                      >      > only by the minor or patch
            version
             >                     increasing the size of their
             >                      >     application.
             >                      >      > 2) Bumping up the version
            of a dependency
             >                     now requires the import
             >                      >      > statement in all java files
            to be updated
             >                     (not too difficult with
             >                      >     some
             >                      >      > sed/grep skills)
             >                      >      >
             >                      >      > The upside of using the
            full version
             >                     string in the relocated package:
             >                      >      > 1) We don't have to worry
            about whether a
             >                     dependency maintains
             >                      >     semantic
             >                      >      > versioning which means our
            users won't
             >                     have to worry about that
             >                      >     either.
             >                      >      > 2) This increases the odds
            that a user
             >                     will load multiple slightly
             >                      >      > different versions of the
            same dependency
             >                     which is known to be
             >                      >      > incompatible in certain
            situations (e.g.
             >                     Netty 4.1.25 can't be on
             >                      >     the
             >                      >      > classpath with Netty 4.1.28
            even though
             >                     they are both shaded due to
             >                      >      > issues of how JNI with
            tcnative works).
             >                      >      >
             >                      >      >
             >                      >      >         (side note: what do
            other projects
             >                     like Flink do?)
             >                      >      >
             >                      >      >         Kenn
             >                      >      >
             >                      >      >         *for generated
            proto classes, they
             >                     need to be altered after
             >                      >      >         being generated so
            shading happens
             >                     there, but actually only
             >                      >      >         relocation and the
            shared vendored
             >                     dep should work
             >                      >     elsewhere in
             >                      >      >         the project
             >                      >      >
             >                      >      >
             >                      >      > We could publish the protos
            and treat them
             >                     as "external"
             >                      >     dependencies
             >                      >      > within the Java projects
            which would also
             >                     remove this pain point.
             >                      >
             >

Reply via email to