Hi All, FYI : We will be going ahead with the present approach, will merge by tomorrow EOD. Considering no one has objections. Thanx Everyone!!!
-Ayush > On 07-Jan-2020, at 9:22 PM, Brahma Reddy Battula <bra...@apache.org> wrote: > > Hi Sree vaddi,Owen,stack,Duo Zhang, > > We can move forward based on your comments, just waiting for your > reply.Hope all of your comments answered..(unification we can think > parallel thread as Vinay mentioned). > > > > On Mon, 6 Jan 2020 at 6:21 PM, Vinayakumar B <vinayakum...@apache.org> > wrote: > >> Hi Sree, >> >>> apache/hadoop-thirdparty, How would it fit into ASF ? As an Incubating >> Project ? Or as a TLP ? >>> Or as a new project definition ? >> As already mentioned by Ayush, this will be a subproject of Hadoop. >> Releases will be voted by Hadoop PMC as per ASF process. >> >> >>> The effort to streamline and put in an accepted standard for the >> dependencies that require shading, >>> seems beyond the siloed efforts of hadoop, hbase, etc.... >> >>> I propose, we bring all the decision makers from all these artifacts in >> one room and decide best course of action. >>> I am looking at, no projects should ever had to shade any artifacts >> except as an absolute necessary alternative. >> >> This is the ideal proposal for any project. But unfortunately some projects >> takes their own course based on need. >> >> In the current case of protobuf in Hadoop, >> Protobuf upgrade from 2.5.0 (which is already EOL) was not taken up to >> avoid downstream failures. Since Hadoop is a platform, its dependencies >> will get added to downstream projects' classpath. So any change in Hadoop's >> dependencies will directly affect downstreams. Hadoop strictly follows >> backward compatibility as far as possible. >> Though protobuf provides wire compatibility b/w versions, it doesnt >> provide compatibility for generated sources. >> Now, to support ARM protobuf upgrade is mandatory. Using shading >> technique, In Hadoop internally can upgrade to shaded protobuf 3.x and >> still have 2.5.0 protobuf (deprecated) for downstreams. >> >> This shading is necessary to have both versions of protobuf supported. >> (2.5.0 (non-shaded) for downstream's classpath and 3.x (shaded) for >> hadoop's internal usage). >> And this entire work to be done before 3.3.0 release. >> >> So, though its ideal to make a common approach for all projects, I suggest >> for Hadoop we can go ahead as per current approach. >> We can also start the parallel effort to address these problems in a >> separate discussion/proposal. Once the solution is available we can revisit >> and adopt new solution accordingly in all such projects (ex: HBase, Hadoop, >> Ratis). >> >> -Vinay >> >>> On Mon, Jan 6, 2020 at 12:39 AM Ayush Saxena <ayush...@gmail.com> wrote: >>> >>> Hey Sree >>> >>>> apache/hadoop-thirdparty, How would it fit into ASF ? As an Incubating >>>> Project ? Or as a TLP ? >>>> Or as a new project definition ? >>>> >>> A sub project of Apache Hadoop, having its own independent release >> cycles. >>> May be you can put this into the same column as ozone or as >>> submarine(couple of months ago). >>> >>> Unifying for all, seems interesting but each project is independent and >> has >>> its own limitations and way of thinking, I don't think it would be an >> easy >>> task to bring all on the same table and get them agree to a common stuff. >>> >>> I guess this has been into discussion since quite long, and there hasn't >>> been any other alternative suggested. Still we can hold up for a week, if >>> someone comes up with a better solution, else we can continue in the >>> present direction. >>> >>> -Ayush >>> >>> >>> >>> On Sun, 5 Jan 2020 at 05:03, Sree Vaddi <sree_at_ch...@yahoo.com >> .invalid> >>> wrote: >>> >>>> apache/hadoop-thirdparty, How would it fit into ASF ? As an Incubating >>>> Project ? Or as a TLP ? >>>> Or as a new project definition ? >>>> >>>> The effort to streamline and put in an accepted standard for the >>>> dependencies that require shading,seems beyond the siloed efforts of >>>> hadoop, hbase, etc.... >>>> >>>> I propose, we bring all the decision makers from all these artifacts in >>>> one room and decide best course of action.I am looking at, no projects >>>> should ever had to shade any artifacts except as an absolute necessary >>>> alternative. >>>> >>>> >>>> Thank you./Sree >>>> >>>> >>>> >>>> On Saturday, January 4, 2020, 7:49:18 AM PST, Vinayakumar B < >>>> vinayakum...@apache.org> wrote: >>>> >>>> Hi, >>>> Sorry for the late reply,. >>>>>>> To be exact, how can we better use the thirdparty repo? Looking at >>>> HBase as an example, it looks like everything that are known to break a >>> lot >>>> after an update get shaded into the hbase-thirdparty artifact: guava, >>>> netty, ... etc. >>>> Is it the purpose to isolate these naughty dependencies? >>>> Yes, shading is to isolate these naughty dependencies from downstream >>>> classpath and have independent control on these upgrades without >> breaking >>>> downstreams. >>>> >>>> First PR https://github.com/apache/hadoop-thirdparty/pull/1 to create >>> the >>>> protobuf shaded jar is ready to merge. >>>> >>>> Please take a look if anyone interested, will be merged may be after >> two >>>> days if no objections. >>>> >>>> -Vinay >>>> >>>> >>>> On Thu, Oct 10, 2019 at 3:30 AM Wei-Chiu Chuang <weic...@apache.org> >>>> wrote: >>>> >>>>> Hi I am late to this but I am keen to understand more. >>>>> >>>>> To be exact, how can we better use the thirdparty repo? Looking at >>> HBase >>>>> as an example, it looks like everything that are known to break a lot >>>> after >>>>> an update get shaded into the hbase-thirdparty artifact: guava, >> netty, >>>> ... >>>>> etc. >>>>> Is it the purpose to isolate these naughty dependencies? >>>>> >>>>> On Wed, Oct 9, 2019 at 12:38 PM Vinayakumar B < >> vinayakum...@apache.org >>>> >>>>> wrote: >>>>> >>>>>> Hi All, >>>>>> >>>>>> I have updated the PR as per @Owen O'Malley <owen.omal...@gmail.com >>> >>>>>> 's suggestions. >>>>>> >>>>>> i. Renamed the module to 'hadoop-shaded-protobuf37' >>>>>> ii. Kept the shaded package to 'o.a.h.thirdparty.protobuf37' >>>>>> >>>>>> Please review!! >>>>>> >>>>>> Thanks, >>>>>> -Vinay >>>>>> >>>>>> >>>>>> On Sat, Sep 28, 2019 at 10:29 AM 张铎(Duo Zhang) < >> palomino...@gmail.com >>>> >>>>>> wrote: >>>>>> >>>>>>> For HBase we have a separated repo for hbase-thirdparty >>>>>>> >>>>>>> https://github.com/apache/hbase-thirdparty >>>>>>> >>>>>>> We will publish the artifacts to nexus so we do not need to >> include >>>>>>> binaries in our git repo, just add a dependency in the pom. >>>>>>> >>>>>>> >>>>>>> >>>>>> >>>> >>> >> https://mvnrepository.com/artifact/org.apache.hbase.thirdparty/hbase-shaded-protobuf >>>>>>> >>>>>>> >>>>>>> And it has its own release cycles, only when there are special >>>>>> requirements >>>>>>> or we want to upgrade some of the dependencies. This is the vote >>>> thread >>>>>> for >>>>>>> the newest release, where we want to provide a shaded gson for >> jdk7. >>>>>>> >>>>>>> >>>>>>> >>>>>> >>>> >>> >> https://lists.apache.org/thread.html/f12c589baabbc79c7fb2843422d4590bea982cd102e2bd9d21e9884b@%3Cdev.hbase.apache.org%3E >>>>>>> >>>>>>> >>>>>>> Thanks. >>>>>>> >>>>>>> Vinayakumar B <vinayakum...@apache.org> 于2019年9月28日周六 上午1:28写道: >>>>>>> >>>>>>>> Please find replies inline. >>>>>>>> >>>>>>>> -Vinay >>>>>>>> >>>>>>>> On Fri, Sep 27, 2019 at 10:21 PM Owen O'Malley < >>>>>> owen.omal...@gmail.com> >>>>>>>> wrote: >>>>>>>> >>>>>>>>> I'm very unhappy with this direction. In particular, I don't >>> think >>>>>> git >>>>>>> is >>>>>>>>> a good place for distribution of binary artifacts. >> Furthermore, >>>> the >>>>>> PMC >>>>>>>>> shouldn't be releasing anything without a release vote. >>>>>>>>> >>>>>>>>> >>>>>>>> Proposed solution doesnt release any binaries in git. Its >>> actually a >>>>>>>> complete sub-project which follows entire release process, >>> including >>>>>> VOTE >>>>>>>> in public. I have mentioned already that release process is >>> similar >>>> to >>>>>>>> hadoop. >>>>>>>> To be specific, using the (almost) same script used in hadoop to >>>>>> generate >>>>>>>> artifacts, sign and deploy to staging repository. Please let me >>> know >>>>>> If I >>>>>>>> am conveying anything wrong. >>>>>>>> >>>>>>>> >>>>>>>>> I'd propose that we make a third party module that contains >> the >>>>>>> *source* >>>>>>>>> of the pom files to build the relocated jars. This should >>>>>> absolutely be >>>>>>>>> treated as a last resort for the mostly Google projects that >>>>>> regularly >>>>>>>>> break binary compatibility (eg. Protobuf & Guava). >>>>>>>>> >>>>>>>>> >>>>>>>> Same has been implemented in the PR >>>>>>>> https://github.com/apache/hadoop-thirdparty/pull/1. Please >> check >>>> and >>>>>> let >>>>>>>> me >>>>>>>> know If I misunderstood. Yes, this is the last option we have >>> AFAIK. >>>>>>>> >>>>>>>> >>>>>>>>> In terms of naming, I'd propose something like: >>>>>>>>> >>>>>>>>> org.apache.hadoop.thirdparty.protobuf2_5 >>>>>>>>> org.apache.hadoop.thirdparty.guava28 >>>>>>>>> >>>>>>>>> In particular, I think we absolutely need to include the >> version >>>> of >>>>>> the >>>>>>>>> underlying project. On the other hand, since we should not be >>>>>> shading >>>>>>>>> *everything* we can drop the leading com.google. >>>>>>>>> >>>>>>>>> >>>>>>>> IMO, This naming convention is easy for identifying the >> underlying >>>>>>> project, >>>>>>>> but it will be difficult to maintain going forward if >> underlying >>>>>> project >>>>>>>> versions changes. Since thirdparty module have its own releases, >>>> each >>>>>> of >>>>>>>> those release can be mapped to specific version of underlying >>>> project. >>>>>>> Even >>>>>>>> the binary artifact can include a MANIFEST with underlying >> project >>>>>>> details >>>>>>>> as per Steve's suggestion on HADOOP-13363. >>>>>>>> That said, if you still prefer to have project number in >> artifact >>>> id, >>>>>> it >>>>>>>> can be done. >>>>>>>> >>>>>>>> The Hadoop project can make releases of the thirdparty module: >>>>>>>>> >>>>>>>>> <dependency> >>>>>>>>> <groupId>org.apache.hadoop</groupId> >>>>>>>>> <artifactId>hadoop-thirdparty-protobuf25</artifactId> >>>>>>>>> <version>1.0</version> >>>>>>>>> </dependency> >>>>>>>>> >>>>>>>>> >>>>>>>> Note that the version has to be the hadoop thirdparty release >>>> number, >>>>>>> which >>>>>>>>> is part of why you need to have the underlying version in the >>>>>> artifact >>>>>>>>> name. These we can push to maven central as new releases from >>>>>> Hadoop. >>>>>>>>> >>>>>>>>> >>>>>>>> Exactly, same has been implemented in the PR. hadoop-thirdparty >>>> module >>>>>>> have >>>>>>>> its own releases. But in HADOOP Jira, thirdparty versions can be >>>>>>>> differentiated using prefix "thirdparty-". >>>>>>>> >>>>>>>> Same solution is being followed in HBase. May be people involved >>> in >>>>>> HBase >>>>>>>> can add some points here. >>>>>>>> >>>>>>>> Thoughts? >>>>>>>>> >>>>>>>>> .. Owen >>>>>>>>> >>>>>>>>> On Fri, Sep 27, 2019 at 8:38 AM Vinayakumar B < >>>>>> vinayakum...@apache.org >>>>>>>> >>>>>>>>> wrote: >>>>>>>>> >>>>>>>>>> Hi All, >>>>>>>>>> >>>>>>>>>> I wanted to discuss about the separate repo for thirdparty >>>>>>>> dependencies >>>>>>>>>> which we need to shaded and include in Hadoop component's >> jars. >>>>>>>>>> >>>>>>>>>> Apologies for the big text ahead, but this needs clear >>>>>>> explanation!! >>>>>>>>>> >>>>>>>>>> Right now most needed such dependency is protobuf. >> Protobuf >>>>>>>> dependency >>>>>>>>>> was not upgraded from 2.5.0 onwards with the fear that >>> downstream >>>>>>>> builds, >>>>>>>>>> which depends on transitive dependency protobuf coming from >>>>>> hadoop's >>>>>>>> jars, >>>>>>>>>> may fail with the upgrade. Apparently protobuf does not >>> guarantee >>>>>>> source >>>>>>>>>> compatibility, though it guarantees wire compatibility >> between >>>>>>> versions. >>>>>>>>>> Because of this behavior, version upgrade may cause breakage >> in >>>>>> known >>>>>>>> and >>>>>>>>>> unknown (private?) downstreams. >>>>>>>>>> >>>>>>>>>> So to tackle this, we came up the following proposal in >>>>>>> HADOOP-13363. >>>>>>>>>> >>>>>>>>>> Luckily, As far as I know, no APIs, either public to user >> or >>>>>>> between >>>>>>>>>> Hadoop processes, is not directly using protobuf classes in >>>>>>> signatures. >>>>>>>>>> (If >>>>>>>>>> any exist, please let us know). >>>>>>>>>> >>>>>>>>>> Proposal: >>>>>>>>>> ------------ >>>>>>>>>> >>>>>>>>>> 1. Create a artifact(s) which contains shaded >> dependencies. >>>> All >>>>>>> such >>>>>>>>>> shading/relocation will be with known prefix >>>>>>>>>> **org.apache.hadoop.thirdparty.**. >>>>>>>>>> 2. Right now protobuf jar (ex: >>>>>>>> o.a.h.thirdparty:hadoop-shaded-protobuf) >>>>>>>>>> to start with, all **com.google.protobuf** classes will be >>>>>> relocated >>>>>>> as >>>>>>>>>> **org.apache.hadoop.thirdparty.com.google.protobuf**. >>>>>>>>>> 3. Hadoop modules, which needs protobuf as dependency, >> will >>>> add >>>>>>> this >>>>>>>>>> shaded artifact as dependency (ex: >>>>>>>>>> o.a.h.thirdparty:hadoop-shaded-protobuf). >>>>>>>>>> 4. All previous usages of "com.google.protobuf" will be >>>>>> relocated >>>>>>> to >>>>>>>>>> "org.apache.hadoop.thirdparty.com.google.protobuf" in the >> code >>>> and >>>>>>> will >>>>>>>> be >>>>>>>>>> committed. Please note, this replacement is One-Time directly >>> in >>>>>>> source >>>>>>>>>> code, NOT during compile and package. >>>>>>>>>> 5. Once all usages of "com.google.protobuf" is relocated, >>> then >>>>>>> hadoop >>>>>>>>>> dont care about which version of original "protobuf-java" is >>> in >>>>>>>>>> dependency. >>>>>>>>>> 6. Just keep "protobuf-java:2.5.0" in dependency tree not >> to >>>>>> break >>>>>>>> the >>>>>>>>>> downstreams. But hadoop will be originally using the latest >>>>>> protobuf >>>>>>>>>> present in "o.a.h.thirdparty:hadoop-shaded-protobuf". >>>>>>>>>> >>>>>>>>>> 7. Coming back to separate repo, Following are most >>>> appropriate >>>>>>>> reasons >>>>>>>>>> of keeping shaded dependency artifact in separate repo >> instead >>> of >>>>>>>>>> submodule. >>>>>>>>>> >>>>>>>>>> 7a. These artifacts need not be built all the time. It >>> needs >>>>>> to >>>>>>> be >>>>>>>>>> built only when there is a change in the dependency version >> or >>>> the >>>>>>> build >>>>>>>>>> process. >>>>>>>>>> 7b. If added as "submodule in Hadoop repo", >>>>>>>> maven-shade-plugin:shade >>>>>>>>>> will execute only in package phase. That means, "mvn compile" >>> or >>>>>> "mvn >>>>>>>>>> test-compile" will not be failed as this artifact will not >> have >>>>>>>> relocated >>>>>>>>>> classes, instead it will have original classes, resulting in >>>>>>> compilation >>>>>>>>>> failure. Workaround, build thirdparty submodule first and >>> exclude >>>>>>>>>> "thirdparty" submodule in other executions. This will be a >>>> complex >>>>>>>> process >>>>>>>>>> compared to keeping in a separate repo. >>>>>>>>>> >>>>>>>>>> 7c. Separate repo, will be a subproject of Hadoop, using >>> the >>>>>>> same >>>>>>>>>> HADOOP jira project, with different versioning prefixed with >>>>>>>> "thirdparty-" >>>>>>>>>> (ex: thirdparty-1.0.0). >>>>>>>>>> 7d. Separate will have same release process as Hadoop. >>>>>>>>>> >>>>>>>>>> HADOOP-13363 ( >>>>>> https://issues.apache.org/jira/browse/HADOOP-13363) >>>>>>>> is >>>>>>>>>> an >>>>>>>>>> umbrella jira tracking the changes to protobuf upgrade. >>>>>>>>>> >>>>>>>>>> PR (https://github.com/apache/hadoop-thirdparty/pull/1) >> has >>>>>> been >>>>>>>>>> raised >>>>>>>>>> for separate repo creation in (HADOOP-16595 ( >>>>>>>>>> https://issues.apache.org/jira/browse/HADOOP-16595) >>>>>>>>>> >>>>>>>>>> Please provide your inputs for the proposal and review the >>> PR >>>>>> to >>>>>>>>>> proceed with the proposal. >>>>>>>>>> >>>>>>>>>> >>>>>>>>> -Thanks, >>>>>>>>>> Vinay >>>>>>>>>> >>>>>>>>>> On Fri, Sep 27, 2019 at 11:54 AM Vinod Kumar Vavilapalli < >>>>>>>>>> vino...@apache.org> >>>>>>>>>> wrote: >>>>>>>>>> >>>>>>>>>>> Moving the thread to the dev lists. >>>>>>>>>>> >>>>>>>>>>> Thanks >>>>>>>>>>> +Vinod >>>>>>>>>>> >>>>>>>>>>>> On Sep 23, 2019, at 11:43 PM, Vinayakumar B < >>>>>>>> vinayakum...@apache.org> >>>>>>>>>>> wrote: >>>>>>>>>>>> >>>>>>>>>>>> Thanks Marton, >>>>>>>>>>>> >>>>>>>>>>>> Current created 'hadoop-thirdparty' repo is empty right >>> now. >>>>>>>>>>>> Whether to use that repo for shaded artifact or not will >>> be >>>>>>>>>> monitored in >>>>>>>>>>>> HADOOP-13363 umbrella jira. Please feel free to join the >>>>>>> discussion. >>>>>>>>>>>> >>>>>>>>>>>> There is no existing codebase is being moved out of >> hadoop >>>>>> repo. >>>>>>> So >>>>>>>> I >>>>>>>>>>> think >>>>>>>>>>>> right now we are good to go. >>>>>>>>>>>> >>>>>>>>>>>> -Vinay >>>>>>>>>>>> >>>>>>>>>>>> On Mon, Sep 23, 2019 at 11:38 PM Marton Elek < >>>> e...@apache.org> >>>>>>>> wrote: >>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>> I am not sure if it's defined when is a vote required. >>>>>>>>>>>>> >>>>>>>>>>>>> https://www.apache.org/foundation/voting.html >>>>>>>>>>>>> >>>>>>>>>>>>> Personally I think it's a big enough change to send a >>>>>>> notification >>>>>>>> to >>>>>>>>>>> the >>>>>>>>>>>>> dev lists with a 'lazy consensus' closure >>>>>>>>>>>>> >>>>>>>>>>>>> Marton >>>>>>>>>>>>> >>>>>>>>>>>>> On 2019/09/23 17:46:37, Vinayakumar B < >>>>>> vinayakum...@apache.org> >>>>>>>>>> wrote: >>>>>>>>>>>>>> Hi, >>>>>>>>>>>>>> >>>>>>>>>>>>>> As discussed in HADOOP-13363, protobuf 3.x jar (and may >>> be >>>>>> more >>>>>>> in >>>>>>>>>>>>> future) >>>>>>>>>>>>>> will be kept as a shaded artifact in a separate repo, >>> which >>>>>> will >>>>>>>> be >>>>>>>>>>>>>> referred as dependency in hadoop modules. This >> approach >>>>>> avoids >>>>>>>>>> shading >>>>>>>>>>>>> of >>>>>>>>>>>>>> every submodule during build. >>>>>>>>>>>>>> >>>>>>>>>>>>>> So question is does any VOTE required before asking to >>>>>> create a >>>>>>>> git >>>>>>>>>>> repo? >>>>>>>>>>>>>> >>>>>>>>>>>>>> On selfserve platform >>>>>>>> https://gitbox.apache.org/setup/newrepo.html >>>>>>>>>>>>>> I can access see that, requester should be PMC. >>>>>>>>>>>>>> >>>>>>>>>>>>>> Wanted to confirm here first. >>>>>>>>>>>>>> >>>>>>>>>>>>>> -Vinay >>>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>> >>>> --------------------------------------------------------------------- >>>>>>>>>>>>> To unsubscribe, e-mail: >>>> private-unsubscr...@hadoop.apache.org >>>>>>>>>>>>> For additional commands, e-mail: >>>>>> private-h...@hadoop.apache.org >>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>> >>>>>>>>> >>>>>>>> >>>>>>> >>>>>> >>>>> >>> >> > -- > > > > --Brahma Reddy Battula --------------------------------------------------------------------- To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org