I think it makes sense for each vendored dependency to be self-contained as much as possible. It should keep it fairly simple. Things that cross their API surface cannot be hidden, of course. Jar size is not a concern IMO.
Kenn On Tue, Oct 23, 2018 at 9:05 AM Lukasz Cwik <[email protected]> wrote: > How should we handle the transitive dependencies of the things we want to > vendor? > > For example we use gRPC which depends on Guava 20 and we also use Calcite > which depends on Guava 19. > > Should the vendored gRPC/Calcite/... be self-contained so it contains all > its dependencies, hence vendored gRPC would contain Guava 20 and vendored > Calcite would contain Guava 19 (both under different namespaces)? > This leads to larger jars but less vendored dependencies to maintain. > > Or should we produce a vendored library for those that we want to share, > e.g. Guava 20 that could be reused across multiple vendored libraries? > Makes the vendoring process slightly more complicated, more dependencies > to maintain, smaller jars. > > Or should we produce a vendored library for each dependency? > Lots of vendoring needed, likely tooling required to be built to maintain > this. > > > > > On Tue, Oct 23, 2018 at 8:46 AM Kenneth Knowles <[email protected]> wrote: > >> I actually created the subtasks by finding things shaded by at least one >> module. I think each one should definitely have an on list discussion that >> clarifies the target artifact, namespace, version, possible complications, >> etc. >> >> My impression is that many many modules shade only Guava. So for build >> time and simplification that is a big win. >> >> Kenn >> >> On Tue, Oct 23, 2018, 08:16 Thomas Weise <[email protected]> wrote: >> >>> +1 for separate artifacts >>> >>> I would request that we explicitly discuss and agree which dependencies >>> we vendor though. >>> >>> Not everything listed in the JIRA subtasks is currently relocated. >>> >>> Thomas >>> >>> >>> On Tue, Oct 23, 2018 at 8:04 AM David Morávek <[email protected]> >>> wrote: >>> >>>> +1 This should improve build times a lot. It would be great if vendored >>>> deps could stay in the main repository. >>>> >>>> D. >>>> >>>> On Tue, Oct 23, 2018 at 12:21 PM Maximilian Michels <[email protected]> >>>> wrote: >>>> >>>>> Looks great, Kenn! >>>>> >>>>> > Max: what is the story behind having a separate flink-shaded repo? >>>>> Did that make it easier to manage in some way? >>>>> >>>>> Better separation of concerns, but I don't think releasing the shaded >>>>> artifacts from the main repo is a problem. I'd even prefer not to >>>>> split >>>>> up the repo because it makes updates to the vendored dependencies >>>>> slightly easier. >>>>> >>>>> On 23.10.18 03:25, Kenneth Knowles wrote: >>>>> > OK, I've filed https://issues.apache.org/jira/browse/BEAM-5819 to >>>>> > collect sub-tasks. This has enough upsides throughout lots of areas >>>>> of >>>>> > the project that even though it is not glamorous it seems pretty >>>>> > valuable to start on immediately. And I want to find out if there's >>>>> a >>>>> > pitfall lurking. >>>>> > >>>>> > Max: what is the story behind having a separate flink-shaded repo? >>>>> Did >>>>> > that make it easier to manage in some way? >>>>> > >>>>> > Kenn >>>>> > >>>>> > On Mon, Oct 22, 2018 at 2:55 AM Maximilian Michels <[email protected] >>>>> > <mailto:[email protected]>> wrote: >>>>> > >>>>> > +1 for publishing vendored Jars independently. It will improve >>>>> build >>>>> > time and ease IntelliJ integration. >>>>> > >>>>> > Flink also publishes shaded dependencies separately: >>>>> > >>>>> > - https://github.com/apache/flink-shaded >>>>> > - https://issues.apache.org/jira/browse/FLINK-6529 >>>>> > >>>>> > AFAIK their main motivation was to get rid of duplicate shaded >>>>> classes >>>>> > on the classpath. We don't appear to have that problem because we >>>>> > already have a separate "vendor" project. >>>>> > >>>>> > > - With shading, it is hard (impossible?) to step into >>>>> dependency >>>>> > code in IntelliJ's debugger, because the actual symbol at runtime >>>>> > does not match what is in the external jars >>>>> > >>>>> > This would be solved by releasing the sources of the shaded jars. >>>>> > From a >>>>> > legal perspective, this could be problematic as alluded to here: >>>>> > https://github.com/apache/flink-shaded/issues/25 >>>>> > >>>>> > -Max >>>>> > >>>>> > On 20.10.18 01:11, Lukasz Cwik wrote: >>>>> > > I have tried several times to improve the build system and >>>>> intellij >>>>> > > integration and each attempt ended with little progress when >>>>> dealing >>>>> > > with vendored code. My latest attempt has been the most >>>>> promising >>>>> > where >>>>> > > I take the vendored classes/jars and decompile them >>>>> generating the >>>>> > > source that Intellij can then use. I have a branch[1] that >>>>> > demonstrates >>>>> > > the idea. It works pretty well (and up until a change where we >>>>> > started >>>>> > > vendoring gRPC, was impractical to do. Instructions to try it >>>>> out >>>>> > are: >>>>> > > >>>>> > > // Clean up any remnants of prior builds/intellij projects >>>>> > > git clean -fdx >>>>> > > // Generated the source for vendored/shaded modules >>>>> > > ./gradlew decompile >>>>> > > >>>>> > > // Remove the "generated" Java sources for protos so they >>>>> don't >>>>> > conflict with the decompiled sources. >>>>> > > rm -rf model/pipeline/build/generated/source/proto >>>>> > > rm -rf model/job-management/build/generated/source/proto >>>>> > > rm -rf model/fn-execution/build/generated/source/proto >>>>> > > // Import the project into Intellij, most code completion now >>>>> > works still some issues with a few classes. >>>>> > > // Note that the Java decompiler doesn't generate valid >>>>> source so >>>>> > still need to delegate to Gradle for build/run/test actions >>>>> > > // Other decompilers may do a better/worse job but haven't >>>>> tried >>>>> > them. >>>>> > > >>>>> > > >>>>> > > The problems that I face are that the generated Java source >>>>> from the >>>>> > > protos and the decompiled source from the compiled version of >>>>> that >>>>> > > source post shading are both being imported as content roots >>>>> and >>>>> > then >>>>> > > conflict. Also, the CFR decompiler isn't producing valid >>>>> source, if >>>>> > > people could try others and report their mileage, we may find >>>>> one >>>>> > that >>>>> > > works and then we would be able to use intellij to build/run >>>>> our >>>>> > code >>>>> > > and not need to delegate all our build/run/test actions to >>>>> Gradle. >>>>> > > >>>>> > > After all these attempts I have done, vendoring the >>>>> dependencies >>>>> > outside >>>>> > > of the project seems like a sane approach and unless someone >>>>> > wants to >>>>> > > take a stab at the best progress I have made above, I would go >>>>> > with what >>>>> > > Kenn is suggesting even though it will mean that we will need >>>>> to >>>>> > perform >>>>> > > releases every time we want to change the version of one of >>>>> our >>>>> > vendored >>>>> > > dependencies. >>>>> > > >>>>> > > 1: https://github.com/lukecwik/incubator-beam/tree/intellij >>>>> > > >>>>> > > >>>>> > > On Fri, Oct 19, 2018 at 10:43 AM Kenneth Knowles < >>>>> [email protected] >>>>> > <mailto:[email protected]> >>>>> > > <mailto:[email protected] <mailto:[email protected]>>> wrote: >>>>> > > >>>>> > > Another reason to push on this is to get build times down. >>>>> > Once only >>>>> > > generated proto classes use the shadow plugin we'll cut >>>>> the build >>>>> > > time in ~half? And there is no reason to constantly >>>>> re-vendor. >>>>> > > >>>>> > > Kenn >>>>> > > >>>>> > > On Fri, Oct 19, 2018 at 10:39 AM Kenneth Knowles >>>>> > <[email protected] <mailto:[email protected]> >>>>> > > <mailto:[email protected] <mailto:[email protected]>>> wrote: >>>>> > > >>>>> > > Hi all, >>>>> > > >>>>> > > A while ago we had pretty good consensus that we >>>>> should >>>>> > vendor >>>>> > > ("pre-shade") specific dependencies, and there's >>>>> start on it >>>>> > > here: >>>>> https://github.com/apache/beam/tree/master/vendor >>>>> > > >>>>> > > IntelliJ notes: >>>>> > > >>>>> > > - With shading, it is hard (impossible?) to step >>>>> into >>>>> > > dependency code in IntelliJ's debugger, because the >>>>> actual >>>>> > > symbol at runtime does not match what is in the >>>>> external jars >>>>> > > >>>>> > > >>>>> > > Intellij can step through the classes if they were published >>>>> > outside the >>>>> > > project since it can decompile them. The source won't be >>>>> legible. >>>>> > > Decompiling the source as in my example effectively shows the >>>>> > same issue. >>>>> > > >>>>> > > - With vendoring, if the vendored dependencies are >>>>> part >>>>> > of the >>>>> > > project then IntelliJ gets confused because it >>>>> operates on >>>>> > > source, not the produced jars >>>>> > > >>>>> > > >>>>> > > Yes, I tried several ways to get intellij to ignore the source >>>>> > and use >>>>> > > the output jars but no luck. >>>>> > > >>>>> > > The second one has a quick fix for most cases*: don't >>>>> > make the >>>>> > > vendored dep a subproject, but just separately build >>>>> and >>>>> > publish >>>>> > > it. Since a vendored dependency should change much >>>>> more >>>>> > > infrequently (or if we bake the version into the name, >>>>> > ~never) >>>>> > > this means we publish once and save headache and build >>>>> > time forever. >>>>> > > >>>>> > > WDYT? Have I overlooked something? How about we set up >>>>> > vendored >>>>> > > versions of guava, protobuf, grpc, and publish them. >>>>> We don't >>>>> > > have to actually start using them yet, and can do it >>>>> > incrementally. >>>>> > > >>>>> > > >>>>> > > Currently we are relocating code depending on the version >>>>> string. >>>>> > If the >>>>> > > major version is >= 1, we use only the major version within >>>>> the >>>>> > package >>>>> > > string and rely on semantic versioning provided by the >>>>> dependency >>>>> > to not >>>>> > > break people. If the major version is 0, we assume the >>>>> dependency is >>>>> > > unstable and use the full version as part of the package >>>>> string >>>>> > during >>>>> > > relocation. >>>>> > > >>>>> > > The downside of using the full version string for relocated >>>>> packages: >>>>> > > 1) Users will end up with multiple copies of dependencies that >>>>> > differ >>>>> > > only by the minor or patch version increasing the size of >>>>> their >>>>> > application. >>>>> > > 2) Bumping up the version of a dependency now requires the >>>>> import >>>>> > > statement in all java files to be updated (not too difficult >>>>> with >>>>> > some >>>>> > > sed/grep skills) >>>>> > > >>>>> > > The upside of using the full version string in the relocated >>>>> package: >>>>> > > 1) We don't have to worry about whether a dependency maintains >>>>> > semantic >>>>> > > versioning which means our users won't have to worry about >>>>> that >>>>> > either. >>>>> > > 2) This increases the odds that a user will load multiple >>>>> slightly >>>>> > > different versions of the same dependency which is known to be >>>>> > > incompatible in certain situations (e.g. Netty 4.1.25 can't >>>>> be on >>>>> > the >>>>> > > classpath with Netty 4.1.28 even though they are both shaded >>>>> due to >>>>> > > issues of how JNI with tcnative works). >>>>> > > >>>>> > > >>>>> > > (side note: what do other projects like Flink do?) >>>>> > > >>>>> > > Kenn >>>>> > > >>>>> > > *for generated proto classes, they need to be altered >>>>> after >>>>> > > being generated so shading happens there, but >>>>> actually only >>>>> > > relocation and the shared vendored dep should work >>>>> > elsewhere in >>>>> > > the project >>>>> > > >>>>> > > >>>>> > > We could publish the protos and treat them as "external" >>>>> > dependencies >>>>> > > within the Java projects which would also remove this pain >>>>> point. >>>>> > >>>>> >>>>
