Would also keep it simple and optimize for the JAR size only if necessary.

On 24.10.18 00:06, Kenneth Knowles wrote:
I think it makes sense for each vendored dependency to be self-contained as much as possible. It should keep it fairly simple. Things that cross their API surface cannot be hidden, of course. Jar size is not a concern IMO.

Kenn

On Tue, Oct 23, 2018 at 9:05 AM Lukasz Cwik <[email protected] <mailto:[email protected]>> wrote:

    How should we handle the transitive dependencies of the things we
    want to vendor?

    For example we use gRPC which depends on Guava 20 and we also use
    Calcite which depends on Guava 19.

    Should the vendored gRPC/Calcite/... be self-contained so it
    contains all its dependencies, hence vendored gRPC would contain
    Guava 20 and vendored Calcite would contain Guava 19 (both under
    different namespaces)?
    This leads to larger jars but less vendored dependencies to maintain.

    Or should we produce a vendored library for those that we want to
    share, e.g. Guava 20 that could be reused across multiple vendored
    libraries?
    Makes the vendoring process slightly more complicated, more
    dependencies to maintain, smaller jars.

    Or should we produce a vendored library for each dependency?
    Lots of vendoring needed, likely tooling required to be built to
    maintain this.




    On Tue, Oct 23, 2018 at 8:46 AM Kenneth Knowles <[email protected]
    <mailto:[email protected]>> wrote:

        I actually created the subtasks by finding things shaded by at
        least one module. I think each one should definitely have an on
        list discussion that clarifies the target artifact, namespace,
        version, possible complications, etc.

        My impression is that many many modules shade only Guava. So for
        build time and simplification that is a big win.

        Kenn

        On Tue, Oct 23, 2018, 08:16 Thomas Weise <[email protected]
        <mailto:[email protected]>> wrote:

            +1 for separate artifacts

            I would request that we explicitly discuss and agree which
            dependencies we vendor though.

            Not everything listed in the JIRA subtasks is currently
            relocated.

            Thomas


            On Tue, Oct 23, 2018 at 8:04 AM David Morávek
            <[email protected] <mailto:[email protected]>>
            wrote:

                +1 This should improve build times a lot. It would be
                great if vendored deps could stay in the main repository.

                D.

                On Tue, Oct 23, 2018 at 12:21 PM Maximilian Michels
                <[email protected] <mailto:[email protected]>> wrote:

                    Looks great, Kenn!

                     > Max: what is the story behind having a separate
                    flink-shaded repo? Did that make it easier to manage
                    in some way?

                    Better separation of concerns, but I don't think
                    releasing the shaded
                    artifacts from the main repo is a problem. I'd even
                    prefer not to split
                    up the repo because it makes updates to the vendored
                    dependencies
                    slightly easier.

                    On 23.10.18 03:25, Kenneth Knowles wrote:
                     > OK, I've filed
                    https://issues.apache.org/jira/browse/BEAM-5819 to
                     > collect sub-tasks. This has enough upsides
                    throughout lots of areas of
                     > the project that even though it is not glamorous
                    it seems pretty
                     > valuable to start on immediately. And I want to
                    find out if there's a
                     > pitfall lurking.
                     >
                     > Max: what is the story behind having a separate
                    flink-shaded repo? Did
                     > that make it easier to manage in some way?
                     >
                     > Kenn
                     >
                     > On Mon, Oct 22, 2018 at 2:55 AM Maximilian
                    Michels <[email protected] <mailto:[email protected]>
                     > <mailto:[email protected] <mailto:[email protected]>>>
                    wrote:
                     >
                     >     +1 for publishing vendored Jars
                    independently. It will improve build
                     >     time and ease IntelliJ integration.
                     >
                     >     Flink also publishes shaded dependencies
                    separately:
                     >
                     >     - https://github.com/apache/flink-shaded
                     >     -
                    https://issues.apache.org/jira/browse/FLINK-6529
                     >
                     >     AFAIK their main motivation was to get rid of
                    duplicate shaded classes
                     >     on the classpath. We don't appear to have
                    that problem because we
                     >     already have a separate "vendor" project.
                     >
                     >      >  - With shading, it is hard (impossible?)
                    to step into dependency
                     >     code in IntelliJ's debugger, because the
                    actual symbol at runtime
                     >     does not match what is in the external jars
                     >
                     >     This would be solved by releasing the sources
                    of the shaded jars.
                     >      From a
                     >     legal perspective, this could be problematic
                    as alluded to here:
                     > https://github.com/apache/flink-shaded/issues/25
                     >
                     >     -Max
                     >
                     >     On 20.10.18 01:11, Lukasz Cwik wrote:
                     >      > I have tried several times to improve the
                    build system and intellij
                     >      > integration and each attempt ended with
                    little progress when dealing
                     >      > with vendored code. My latest attempt has
                    been the most promising
                     >     where
                     >      > I take the vendored classes/jars and
                    decompile them generating the
                     >      > source that Intellij can then use. I have
                    a branch[1] that
                     >     demonstrates
                     >      > the idea. It works pretty well (and up
                    until a change where we
                     >     started
                     >      > vendoring gRPC, was impractical to do.
                    Instructions to try it out
                     >     are:
                     >      >
                     >      > // Clean up any remnants of prior
                    builds/intellij projects
                     >      > git clean -fdx
                     >      > // Generated the source for
                    vendored/shaded modules
                     >      > ./gradlew decompile
                     >      >
                     >      > // Remove the "generated" Java sources for
                    protos so they don't
                     >     conflict with the decompiled sources.
                     >      > rm -rf
                    model/pipeline/build/generated/source/proto
                     >      > rm -rf
                    model/job-management/build/generated/source/proto
                     >      > rm -rf
                    model/fn-execution/build/generated/source/proto
                     >      > // Import the project into Intellij, most
                    code completion now
                     >     works still some issues with a few classes.
                     >      > // Note that the Java decompiler doesn't
                    generate valid source so
                     >     still need to delegate to Gradle for
                    build/run/test actions
                     >      > // Other decompilers may do a better/worse
                    job but haven't tried
                     >     them.
                     >      >
                     >      >
                     >      > The problems that I face are that the
                    generated Java source from the
                     >      > protos and the decompiled source from the
                    compiled version of that
                     >      > source post shading are both being
                    imported as content roots and
                     >     then
                     >      > conflict. Also, the CFR decompiler isn't
                    producing valid source, if
                     >      > people could try others and report their
                    mileage, we may find one
                     >     that
                     >      > works and then we would be able to use
                    intellij to build/run our
                     >     code
                     >      > and not need to delegate all our
                    build/run/test actions to Gradle.
                     >      >
                     >      > After all these attempts I have done,
                    vendoring the dependencies
                     >     outside
                     >      > of the project seems like a sane approach
                    and unless someone
                     >     wants to
                     >      > take a stab at the best progress I have
                    made above, I would go
                     >     with what
                     >      > Kenn is suggesting even though it will
                    mean that we will need to
                     >     perform
                     >      > releases every time we want to change the
                    version of one of our
                     >     vendored
                     >      > dependencies.
                     >      >
                     >      > 1:
                    https://github.com/lukecwik/incubator-beam/tree/intellij
                     >      >
                     >      >
                     >      > On Fri, Oct 19, 2018 at 10:43 AM Kenneth
                    Knowles <[email protected] <mailto:[email protected]>
                     >     <mailto:[email protected] <mailto:[email protected]>>
                     >      > <mailto:[email protected]
                    <mailto:[email protected]> <mailto:[email protected]
                    <mailto:[email protected]>>>> wrote:
                     >      >
                     >      >     Another reason to push on this is to
                    get build times down.
                     >     Once only
                     >      >     generated proto classes use the shadow
                    plugin we'll cut the build
                     >      >     time in ~half? And there is no reason
                    to constantly re-vendor.
                     >      >
                     >      >     Kenn
                     >      >
                     >      >     On Fri, Oct 19, 2018 at 10:39 AM
                    Kenneth Knowles
                     >     <[email protected] <mailto:[email protected]>
                    <mailto:[email protected] <mailto:[email protected]>>
                     >      >     <mailto:[email protected]
                    <mailto:[email protected]> <mailto:[email protected]
                    <mailto:[email protected]>>>> wrote:
                     >      >
                     >      >         Hi all,
                     >      >
                     >      >         A while ago we had pretty good
                    consensus that we should
                     >     vendor
                     >      >         ("pre-shade") specific
                    dependencies, and there's start on it
                     >      >         here:
                    https://github.com/apache/beam/tree/master/vendor
                     >      >
                     >      >         IntelliJ notes:
                     >      >
                     >      >           - With shading, it is hard
                    (impossible?) to step into
                     >      >         dependency code in IntelliJ's
                    debugger, because the actual
                     >      >         symbol at runtime does not match
                    what is in the external jars
                     >      >
                     >      >
                     >      > Intellij can step through the classes if
                    they were published
                     >     outside the
                     >      > project since it can decompile them. The
                    source won't be legible.
                     >      > Decompiling the source as in my example
                    effectively shows the
                     >     same issue.
                     >      >
                     >      >           - With vendoring, if the
                    vendored dependencies are part
                     >     of the
                     >      >         project then IntelliJ gets
                    confused because it operates on
                     >      >         source, not the produced jars
                     >      >
                     >      >
                     >      > Yes, I tried several ways to get intellij
                    to ignore the source
                     >     and use
                     >      > the output jars but no luck.
                     >      >
                     >      >         The second one has a quick fix for
                    most cases*: don't
                     >     make the
                     >      >         vendored dep a subproject, but
                    just separately build and
                     >     publish
                     >      >         it. Since a vendored dependency
                    should change much more
                     >      >         infrequently (or if we bake the
                    version into the name,
                     >     ~never)
                     >      >         this means we publish once and
                    save headache and build
                     >     time forever.
                     >      >
                     >      >         WDYT? Have I overlooked something?
                    How about we set up
                     >     vendored
                     >      >         versions of guava, protobuf, grpc,
                    and publish them. We don't
                     >      >         have to actually start using them
                    yet, and can do it
                     >     incrementally.
                     >      >
                     >      >
                     >      > Currently we are relocating code depending
                    on the version string.
                     >     If the
                     >      > major version is >= 1, we use only the
                    major version within the
                     >     package
                     >      > string and rely on semantic versioning
                    provided by the dependency
                     >     to not
                     >      > break people. If the major version is 0,
                    we assume the dependency is
                     >      > unstable and use the full version as part
                    of the package string
                     >     during
                     >      > relocation.
                     >      >
                     >      > The downside of using the full version
                    string for relocated packages:
                     >      > 1) Users will end up with multiple copies
                    of dependencies that
                     >     differ
                     >      > only by the minor or patch version
                    increasing the size of their
                     >     application.
                     >      > 2) Bumping up the version of a dependency
                    now requires the import
                     >      > statement in all java files to be updated
                    (not too difficult with
                     >     some
                     >      > sed/grep skills)
                     >      >
                     >      > The upside of using the full version
                    string in the relocated package:
                     >      > 1) We don't have to worry about whether a
                    dependency maintains
                     >     semantic
                     >      > versioning which means our users won't
                    have to worry about that
                     >     either.
                     >      > 2) This increases the odds that a user
                    will load multiple slightly
                     >      > different versions of the same dependency
                    which is known to be
                     >      > incompatible in certain situations (e.g.
                    Netty 4.1.25 can't be on
                     >     the
                     >      > classpath with Netty 4.1.28 even though
                    they are both shaded due to
                     >      > issues of how JNI with tcnative works).
                     >      >
                     >      >
                     >      >         (side note: what do other projects
                    like Flink do?)
                     >      >
                     >      >         Kenn
                     >      >
                     >      >         *for generated proto classes, they
                    need to be altered after
                     >      >         being generated so shading happens
                    there, but actually only
                     >      >         relocation and the shared vendored
                    dep should work
                     >     elsewhere in
                     >      >         the project
                     >      >
                     >      >
                     >      > We could publish the protos and treat them
                    as "external"
                     >     dependencies
                     >      > within the Java projects which would also
                    remove this pain point.
                     >

Reply via email to